Hi Xiangrui, I accidentally did not send df -i for the master node. Here it is at the moment of failure:
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/xvda1 524288 280938 243350 54% / tmpfs 3845409 1 3845408 1% /dev/shm /dev/xvdb 10002432 1027 10001405 1% /mnt /dev/xvdf 10002432 16 10002416 1% /mnt2 /dev/xvdv 524288000 13 524287987 1% /vol I am using default settings now, but is there a way to make sure that the proper directories are being used? How many blocks/partitions do you recommend? Chris On Wed, Jul 16, 2014 at 1:09 AM, Chris DuBois <chris.dub...@gmail.com> wrote: > Hi Xiangrui, > > Here is the result on the master node: > $ df -i > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/xvda1 524288 273997 250291 53% / > tmpfs 1917974 1 1917973 1% /dev/shm > /dev/xvdv 524288000 30 524287970 1% /vol > > I have reproduced the error while using the MovieLens 10M data set on a > newly created cluster. > > Thanks for the help. > Chris > > > On Wed, Jul 16, 2014 at 12:22 AM, Xiangrui Meng <men...@gmail.com> wrote: > >> Hi Chris, >> >> Could you also try `df -i` on the master node? How many >> blocks/partitions did you set? >> >> In the current implementation, ALS doesn't clean the shuffle data >> because the operations are chained together. But it shouldn't run out >> of disk space on the MovieLens dataset, which is small. spark-ec2 >> script sets /mnt/spark and /mnt/spark2 as the local.dir by default, I >> would recommend leaving this setting as the default value. >> >> Best, >> Xiangrui >> >> On Wed, Jul 16, 2014 at 12:02 AM, Chris DuBois <chris.dub...@gmail.com> >> wrote: >> > Thanks for the quick responses! >> > >> > I used your final -Dspark.local.dir suggestion, but I see this during >> the >> > initialization of the application: >> > >> > 14/07/16 06:56:08 INFO storage.DiskBlockManager: Created local >> directory at >> > /vol/spark-local-20140716065608-7b2a >> > >> > I would have expected something in /mnt/spark/. >> > >> > Thanks, >> > Chris >> > >> > >> > >> > On Tue, Jul 15, 2014 at 11:44 PM, Chris Gore <cdg...@cdgore.com> wrote: >> >> >> >> Hi Chris, >> >> >> >> I've encountered this error when running Spark’s ALS methods too. In >> my >> >> case, it was because I set spark.local.dir improperly, and every time >> there >> >> was a shuffle, it would spill many GB of data onto the local drive. >> What >> >> fixed it was setting it to use the /mnt directory, where a network >> drive is >> >> mounted. For example, setting an environmental variable: >> >> >> >> export SPACE=$(mount | grep mnt | awk '{print $3"/spark/"}' | xargs | >> sed >> >> 's/ /,/g’) >> >> >> >> Then adding -Dspark.local.dir=$SPACE or simply >> >> -Dspark.local.dir=/mnt/spark/,/mnt2/spark/ when you run your driver >> >> application >> >> >> >> Chris >> >> >> >> On Jul 15, 2014, at 11:39 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> >> >> > Check the number of inodes (df -i). The assembly build may create >> many >> >> > small files. -Xiangrui >> >> > >> >> > On Tue, Jul 15, 2014 at 11:35 PM, Chris DuBois < >> chris.dub...@gmail.com> >> >> > wrote: >> >> >> Hi all, >> >> >> >> >> >> I am encountering the following error: >> >> >> >> >> >> INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: >> No >> >> >> space >> >> >> left on device [duplicate 4] >> >> >> >> >> >> For each slave, df -h looks roughtly like this, which makes the >> above >> >> >> error >> >> >> surprising. >> >> >> >> >> >> Filesystem Size Used Avail Use% Mounted on >> >> >> /dev/xvda1 7.9G 4.4G 3.5G 57% / >> >> >> tmpfs 7.4G 4.0K 7.4G 1% /dev/shm >> >> >> /dev/xvdb 37G 3.3G 32G 10% /mnt >> >> >> /dev/xvdf 37G 2.0G 34G 6% /mnt2 >> >> >> /dev/xvdv 500G 33M 500G 1% /vol >> >> >> >> >> >> I'm on an EC2 cluster (c3.xlarge + 5 x m3) that I launched using the >> >> >> spark-ec2 scripts and a clone of spark from today. The job I am >> running >> >> >> closely resembles the collaborative filtering example. This issue >> >> >> happens >> >> >> with the 1M version as well as the 10 million rating version of the >> >> >> MovieLens dataset. >> >> >> >> >> >> I have seen previous questions, but they haven't helped yet. For >> >> >> example, I >> >> >> tried setting the Spark tmp directory to the EBS volume at /vol/, >> both >> >> >> by >> >> >> editing the spark conf file (and copy-dir'ing it to the slaves) as >> well >> >> >> as >> >> >> through the SparkConf. Yet I still get the above error. Here is my >> >> >> current >> >> >> Spark config below. Note that I'm launching via >> >> >> ~/spark/bin/spark-submit. >> >> >> >> >> >> conf = SparkConf() >> >> >> conf.setAppName("RecommendALS").set("spark.local.dir", >> >> >> "/vol/").set("spark.executor.memory", >> "7g").set("spark.akka.frameSize", >> >> >> "100").setExecutorEnv("SPARK_JAVA_OPTS", " >> -Dspark.akka.frameSize=100") >> >> >> sc = SparkContext(conf=conf) >> >> >> >> >> >> Thanks for any advice, >> >> >> Chris >> >> >> >> >> >> > >> > >