Re: No space left on device error when pulling data from s3

2014-05-15 Thread darkjh
this in the spark-ec2 script. Writing lots of tmp files in the 8GB `/` is not a great idea. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-space-left-on-device-error-when-pulling-data-from-s3-tp5450p5518.html Sent from the Apache Spark User List mailing list archive

No space left on device error when pulling data from s3

2014-05-06 Thread Han JU
Hi, I've a `no space left on device` exception when pulling some 22GB data from s3 block storage to the ephemeral HDFS. The cluster is on EC2 using spark-ec2 script with 4 m1.large. The code is basically: val in = sc.textFile(s3://...) in.saveAsTextFile(hdfs://...) Spark creates 750 input

Re: No space left on device error when pulling data from s3

2014-05-06 Thread Akhil Das
I wonder why is your / is full. Try clearing out /tmp and also make sure in the spark-env.sh you have put SPARK_JAVA_OPTS+= -Dspark.local.dir=/mnt/spark Thanks Best Regards On Tue, May 6, 2014 at 9:35 PM, Han JU ju.han.fe...@gmail.com wrote: Hi, I've a `no space left on device` exception

Re: No space left on device error when pulling data from s3

2014-05-06 Thread Han JU
After some investigation, I found out that there's lots of temp files under /tmp/hadoop-root/s3/ But this is strange since in both conf files, ~/ephemeral-hdfs/conf/core-site.xml and ~/spark/conf/core-site.xml, the setting `hadoop.tmp.dir` is set to `/mnt/ephemeral-hdfs/`. Why spark jobs still