Hello,

One of the main "selling points" of Spark is that unlike Hadoop map-reduce that 
persists intermediate results of its computation to HDFS (disk), Spark keeps 
all its results in memory. I don't understand this as in reality when a Spark 
stage finishes[it writes all of the data into shuffle files stored on the 
disk](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/4-shuffleDetails.md).
 How then is this an improvement on map-reduce?

Image from https://youtu.be/7ooZ4S7Ay6Y

thanks!

Reply via email to