Why does SortShuffleWriter write to disk always?

Pramod Biligiri Sat, 02 May 2015 12:11:56 -0700

Hi,
I was trying to see if I can make Spark avoid hitting the disk for small
jobs, but I see that the SortShuffleWriter.write() always writes to disk. I
found an older thread (
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-td584.html)
saying that it doesn't call fsync on this write path.


My question is why does it always write to disk?
Does it mean the reduce phase reads the result from the disk as well?
Isn't it possible to read the data from map/buffer in ExternalSorter
directly during the reduce phase?

Thanks,
Pramod

Why does SortShuffleWriter write to disk always?

Reply via email to