Re: Spark and disk usage.

Andrew Ash Wed, 17 Sep 2014 11:05:18 -0700

Thanks for the info!

Are there performance impacts with writing to HDFS instead of local disk?
 I'm assuming that's why ALS checkpoints every third iteration instead of
every iteration.


Also I can imagine that checkpointing should be done every N shuffles
instead of every N operations (counting maps), since only the shuffle
leaves data on disk.  Do you have any suggestions on this?

We should write up some guidance on the use of checkpointing in the programming
guide <https://spark.apache.org/docs/latest/programming-guide.html> - I can
help with this

Andrew

Re: Spark and disk usage.

Reply via email to