You can actually turn off shuffle compression by setting spark.shuffle.compress 
to false. Try that out, there will still be some buffers for the various 
OutputStreams, but they should be smaller.

Matei

On Jul 14, 2014, at 3:30 PM, Stephen Haberman <stephen.haber...@gmail.com> 
wrote:

> 
> Just a comment from the peanut gallery, but these buffers are a real
> PITA for us as well. Probably 75% of our non-user-error job failures
> are related to them.
> 
> Just naively, what about not doing compression on the fly? E.g. during
> the shuffle just write straight to disk, uncompressed?
> 
> For us, we always have plenty of disk space, and if you're concerned
> about network transmission, you could add a separate compress step
> after the blocks have been written to disk, but before being sent over
> the wire.
> 
> Granted, IANAE, so perhaps this is a bad idea; either way, awesome to
> see work in this area!
> 
> - Stephen
> 

Reply via email to