Hi,
I was trying to see if I can make Spark avoid hitting the disk for small
jobs, but I see that the SortShuffleWriter.write() always writes to disk. I
found an older thread (
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-td584.html)
saying that it doesn't call fsync on this write path.

My question is why does it always write to disk?
Does it mean the reduce phase reads the result from the disk as well?
Isn't it possible to read the data from map/buffer in ExternalSorter
directly during the reduce phase?

Thanks,
Pramod

Reply via email to