Thanks for the info!

Are there performance impacts with writing to HDFS instead of local disk?
 I'm assuming that's why ALS checkpoints every third iteration instead of
every iteration.

Also I can imagine that checkpointing should be done every N shuffles
instead of every N operations (counting maps), since only the shuffle
leaves data on disk.  Do you have any suggestions on this?

We should write up some guidance on the use of checkpointing in the programming
guide <https://spark.apache.org/docs/latest/programming-guide.html> - I can
help with this

Andrew

Reply via email to