>  around 1,000,000 spills were fetched committing around 100MB to the
>memory budget (500,000 in memory). However, actual memory used for 500,000
>segments (50-350 bytes) is 480MB (expected 100-200MB)

This is effectively the problem the mem2merger solves - but is not enabled
by default.

I noticed that this build up of >100 segment in-memory is generally a bad
thing and merging it back into 1 segment in-memory was a significant boost
to perf when producing the iterators for the reducers.

can you re-run the scenario with in-mem merge enabled with an
io.sort.factor = 100 ?

Cheers,
Gopal 


Reply via email to