Thanks Himanshu and RahulKumar!
The databricks forum post was extremely useful. It is great to see an
article that clearly details how and when shuffles are cleaned up.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage
'spark.shuffle.memoryFraction' to 0.4 which is by default 0.2
this should make a significant difference in disk use of shuffle.
Thank you
-
Himanshu Mehra
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23334.html
Sent from the Apache Spark
(SigmoidAnalytics), India
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279p23323.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
there for a good
reason.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-tp23279.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
@spark.apache.org
Sent: Wednesday, September 17, 2014 11:04:02 AM
Subject: Re: Spark and disk usage.
Thanks for the info!
Are there performance impacts with writing to HDFS instead of local disk?
I'm assuming that's why ALS checkpoints every third iteration instead of
every iteration.
Also
Hello everyone.
The problem is that spark write data to the disk very hard, even if
application has a lot of free memory (about 3.8g).
So, I've noticed that folder with name like
spark-local-20140917165839-f58c contains a lot of other folders with
files like shuffle_446_0_1. The total size of
the current status instead and will clean up files
from disk.
Best,
Burak
- Original Message -
From: Макар Красноперов connector@gmail.com
To: user@spark.apache.org
Sent: Wednesday, September 17, 2014 7:37:49 AM
Subject: Spark and disk usage.
Hello everyone.
The problem is that spark
directory in
HDFS, where Spark will write the current status instead and will clean up
files from disk.
Best,
Burak
- Original Message -
From: Макар Красноперов connector@gmail.com
To: user@spark.apache.org
Sent: Wednesday, September 17, 2014 7:37:49 AM
Subject: Spark and disk
the directory will not be enough.
Best,
Burak
- Original Message -
From: Andrew Ash and...@andrewash.com
To: Burak Yavuz bya...@stanford.edu
Cc: Макар Красноперов connector@gmail.com, user
user@spark.apache.org
Sent: Wednesday, September 17, 2014 10:19:42 AM
Subject: Re: Spark and disk usage.
Hi
Thanks for the info!
Are there performance impacts with writing to HDFS instead of local disk?
I'm assuming that's why ALS checkpoints every third iteration instead of
every iteration.
Also I can imagine that checkpointing should be done every N shuffles
instead of every N operations (counting
Sent: Wednesday, September 17, 2014 11:04:02 AM
Subject: Re: Spark and disk usage.
Thanks for the info!
Are there performance impacts with writing to HDFS instead of local disk?
I'm assuming that's why ALS checkpoints every third iteration instead of
every iteration.
Also I can imagine
12 matches
Mail list logo