Thanks Akhil!

I suspect the root cause of the shuffle OOM I was seeing (and probably many
that users might see) is due to individual partitions on the reduce side not
fitting in memory. As a guideline, I was thinking of something like "be sure
that your largest partitions occupy no more then 1% of executor memory" or
something to that effect. I can add that documentation to the tuning page if
someone can suggest the the best wording and numbers. I can also add a
simple Spark shell example to estimate largest partition size to determine
executor memory and number of partitions.

One more question: I'm trying to get my head around the shuffle code. I see
ShuffleManager, but that seems to be on the reduce side. Where is the code
driving the map side writes and reduce reads? I think it is possible to add
up reduce side volume for a key (they are byte reads at some point) and
raise an alarm if it's getting too high. Even a warning on the console would
be better than a catastrophic OOM.



-----
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Detecting-configuration-problems-tp13980p13998.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to