Thanks for the info! Are there performance impacts with writing to HDFS instead of local disk? I'm assuming that's why ALS checkpoints every third iteration instead of every iteration.
Also I can imagine that checkpointing should be done every N shuffles instead of every N operations (counting maps), since only the shuffle leaves data on disk. Do you have any suggestions on this? We should write up some guidance on the use of checkpointing in the programming guide <https://spark.apache.org/docs/latest/programming-guide.html> - I can help with this Andrew