Re: Limit Spark Shuffle Disk Usage

2015-06-17 Thread Al M
Thanks Himanshu and RahulKumar! The databricks forum post was extremely useful. It is great to see an article that clearly details how and when shuffles are cleaned up. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage

Re: Limit Spark Shuffle Disk Usage

2015-06-16 Thread Himanshu Mehra
'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2 this should make a significant difference in disk use of shuffle. Thank you - Himanshu Mehra -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html Sent from the Apache Spark

Re: Limit Spark Shuffle Disk Usage

2015-06-15 Thread rahulkumar-aws
(SigmoidAnalytics), India -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23323.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Limit Spark Shuffle Disk Usage

2015-06-12 Thread Akhil Das
there for a good reason. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Limit Spark Shuffle Disk Usage

2015-06-11 Thread Al M
in context: http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: Spark and disk usage.

2014-09-21 Thread Andrew Ash
@spark.apache.org Sent: Wednesday, September 17, 2014 11:04:02 AM Subject: Re: Spark and disk usage. Thanks for the info! Are there performance impacts with writing to HDFS instead of local disk? I'm assuming that's why ALS checkpoints every third iteration instead of every iteration. Also

Spark and disk usage.

2014-09-17 Thread Макар Красноперов
Hello everyone. The problem is that spark write data to the disk very hard, even if application has a lot of free memory (about 3.8g). So, I've noticed that folder with name like spark-local-20140917165839-f58c contains a lot of other folders with files like shuffle_446_0_1. The total size of

Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
the current status instead and will clean up files from disk. Best, Burak - Original Message - From: Макар Красноперов connector@gmail.com To: user@spark.apache.org Sent: Wednesday, September 17, 2014 7:37:49 AM Subject: Spark and disk usage. Hello everyone. The problem is that spark

Re: Spark and disk usage.

2014-09-17 Thread Andrew Ash
directory in HDFS, where Spark will write the current status instead and will clean up files from disk. Best, Burak - Original Message - From: Макар Красноперов connector@gmail.com To: user@spark.apache.org Sent: Wednesday, September 17, 2014 7:37:49 AM Subject: Spark and disk

Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
the directory will not be enough. Best, Burak - Original Message - From: Andrew Ash and...@andrewash.com To: Burak Yavuz bya...@stanford.edu Cc: Макар Красноперов connector@gmail.com, user user@spark.apache.org Sent: Wednesday, September 17, 2014 10:19:42 AM Subject: Re: Spark and disk usage. Hi

Re: Spark and disk usage.

2014-09-17 Thread Andrew Ash
Thanks for the info! Are there performance impacts with writing to HDFS instead of local disk? I'm assuming that's why ALS checkpoints every third iteration instead of every iteration. Also I can imagine that checkpointing should be done every N shuffles instead of every N operations (counting

Re: Spark and disk usage.

2014-09-17 Thread Burak Yavuz
Sent: Wednesday, September 17, 2014 11:04:02 AM Subject: Re: Spark and disk usage. Thanks for the info! Are there performance impacts with writing to HDFS instead of local disk? I'm assuming that's why ALS checkpoints every third iteration instead of every iteration. Also I can imagine