Thanks for the recommendation. I can definitely run this job with the
proposed setting. In addition, I created a patch in
https://issues.apache.org/jira/browse/TEZ-3076 that reduces the memory need
in MapOutput and InMemoryReader that allows this job to run without the
need for tez.runtime.shuffle.memory-to-memory.enable setting enabled. I'll
update the jira with the overall reduction per mapoutput entry and
inmemoryreader.

Please have a look.

Jon


On Wed, Jan 20, 2016 at 5:18 PM, Gopal Vijayaraghavan <[email protected]>
wrote:

>
> >  around 1,000,000 spills were fetched committing around 100MB to the
> >memory budget (500,000 in memory). However, actual memory used for 500,000
> >segments (50-350 bytes) is 480MB (expected 100-200MB)
>
> This is effectively the problem the mem2merger solves - but is not enabled
> by default.
>
> I noticed that this build up of >100 segment in-memory is generally a bad
> thing and merging it back into 1 segment in-memory was a significant boost
> to perf when producing the iterators for the reducers.
>
> can you re-run the scenario with in-mem merge enabled with an
> io.sort.factor = 100 ?
>
> Cheers,
> Gopal
>
>
>

Reply via email to