> around 1,000,000 spills were fetched committing around 100MB to the >memory budget (500,000 in memory). However, actual memory used for 500,000 >segments (50-350 bytes) is 480MB (expected 100-200MB)
This is effectively the problem the mem2merger solves - but is not enabled by default. I noticed that this build up of >100 segment in-memory is generally a bad thing and merging it back into 1 segment in-memory was a significant boost to perf when producing the iterators for the reducers. can you re-run the scenario with in-mem merge enabled with an io.sort.factor = 100 ? Cheers, Gopal
