Hi SK,

For the problem with lots of shuffle files and the "too many open files"
exception there are a couple options:

1. The linux kernel has a limit on the number of open files at once.  This
is set with ulimit -n, and can be set permanently in /etc/sysctl.conf or
/etc/sysctl.d/.  Try increasing this to a large value, at the bare minimum
the square of your partition count.
2. Try using shuffle consolidation -- spark.shuffle.consolidateFiles=true This
option writes fewer files to disk so shouldn't hit limits nearly as much
3. Try using the sort-based shuffle by setting spark.shuffle.manager=SORT.
You should likely hold off on this until
https://issues.apache.org/jira/browse/SPARK-3032 is fixed, hopefully in
1.1.1

Hope that helps!
Andrew

On Thu, Sep 25, 2014 at 4:20 PM, SK <skrishna...@gmail.com> wrote:

> Hi,
>
> I am using Spark 1.1.0 on a cluster. My job takes as input 30 files in a
> directory (I am using  sc.textfile("dir/*") ) to read in the files.  I am
> getting the following warning:
>
> WARN TaskSetManager: Lost task 99.0 in stage 1.0 (TID 99,
> mesos12-dev.sccps.net): java.io.FileNotFoundException:
> /tmp/spark-local-20140925215712-0319/12/shuffle_0_99_93138 (Too many open
> files)
>
> basically I think a lot of shuffle files are being created.
>
> 1) The tasks eventually fail and the job just hangs (after taking very
> long,
> more than an hour).  If I read these 30 files in a for loop, the same job
> completes in a few minutes. However, I need to specify the files names,
> which is not convenient. I am assuming that sc.textfile("dir/*") creates a
> large RDD for all the 30 files. Is there a way to make the operation on
> this
> large RDD efficient so as to avoid creating too many shuffle files?
>
>
> 2) Also, I am finding that all the shuffle files for my other completed
> jobs
> are not being automatically deleted even after days. I thought that
> sc.stop() clears the intermediate files.  Is there some way to
> programmatically delete these temp shuffle files upon job completion?
>
>
> thanks
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to