date:20150208

R: spark 1.2 writing on parquet after a join never ends - GC problems

2015-02-08 Thread Paolo Platter

Could anyone figure out what is going in my spark cluster?

Thanks in advance

Paolo

Inviata dal mio Windows Phone

Da: Paolo Plattermailto:paolo.plat...@agilelab.it
Inviato: ‎06/‎02/‎2015 10:48
A: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: spark 1.2 writing on parquet after a join never ends - GC problems

Hi all,

I’m experiencing a strange behaviour of spark 1.2.

I’ve a 3 node cluster + the master.

each node has:
1 HDD 7200 rpm 1 TB
16 GB RAM
8 core

I configured executors with 6 cores and 10 GB each (  
spark.storage.memoryFraction = 0.6 )

My job is pretty simple:


val file1 = sc.parquetFile(“path1”)  //19M rows
val file2 = sc.textFile(“path2”) //12K rows

val join = file1.as(‘f1’).join(file2.as(‘f2’), LeftOuter, Some(“f1.field”.attr 
=== ”f2.field”.attr))

join.map( _.toCaseClass() ).saveAsParquetFile( “path3” )


When I perform this job into the spark-shell without writing on parquet file, 
but performing a final count to execute the pipeline, it’s pretty fast.
When I submit the application to the cluster with the saveAsParquetFile 
instruction, task execution slows progressively and it never ends.
I debugged this behaviour and I found that the cause is the executor’s 
disconnection due to missing heartbeat. Missing heatbeat in my opinion is 
related to GC (I report to you a piece of GC log from one of the executors)

484.861: [GC [PSYoungGen: 2053788K-718157K(2561024K)] 
7421222K-6240219K(9551872K), 2.6802130 secs] [Times: user=1.94 sys=0.60, 
real=2.68 secs]
497.751: [GC [PSYoungGen: 2560845K-782081K(2359808K)] 
8082907K-6984335K(9350656K), 4.8611660 secs] [Times: user=3.66 sys=1.55, 
real=4.86 secs]
510.654: [GC [PSYoungGen: 2227457K-625664K(2071552K)] 
8429711K-7611342K(9062400K), 22.5727850 secs] [Times: user=3.34 sys=2.43, 
real=22.57 secs]
533.745: [Full GC [PSYoungGen: 625664K-0K(2071552K)] [ParOldGen: 
6985678K-2723917K(6990848K)] 7611342K-2723917K(9062400K) [PSPermGen: 
62290K-6
K(124928K)], 56.9075910 secs] [Times: user=65.28 sys=5.91, real=56.90 secs]
667.637: [GC [PSYoungGen: 1445376K-623184K(2404352K)] 
4169293K-3347101K(9395200K), 11.7959290 secs] [Times: user=1.58 sys=0.60, 
real=11.79 secs]
690.936: [GC [PSYoungGen: 1973328K-584256K(2422784K)] 
4697245K-3932841K(9413632K), 39.3594850 secs] [Times: user=2.88 sys=0.96, 
real=39.36 secs]
789.891: [GC [PSYoungGen: 1934400K-585552K(2434048K)] 
5282985K-4519857K(9424896K), 17.4456720 secs] [Times: user=2.65 sys=1.36, 
real=17.44 secs]
814.697: [GC [PSYoungGen: 1951056K-330109K(2426880K)] 
5885361K-4851426K(9417728K), 20.9578300 secs] [Times: user=1.64 sys=0.81, 
real=20.96 secs]
842.968: [GC [PSYoungGen: 1695613K-180290K(2489344K)] 
6216930K-4888775K(9480192K), 3.2760780 secs] [Times: user=0.40 sys=0.30, 
real=3.28 secs]
886.660: [GC [PSYoungGen: 1649218K-427552K(2475008K)] 
6357703K-5239028K(9465856K), 5.4738210 secs] [Times: user=1.47 sys=0.25, 
real=5.48 secs]
897.979: [GC [PSYoungGen: 1896480K-634144K(2487808K)] 
6707956K-5874208K(9478656K), 23.6440110 secs] [Times: user=2.63 sys=1.11, 
real=23.64 secs]
929.706: [GC [PSYoungGen: 2169632K-663200K(2199040K)] 
7409696K-6538992K(9189888K), 39.3632270 secs] [Times: user=3.36 sys=1.71, 
real=39.36 secs]
1006.206: [GC [PSYoungGen: 2198688K-655584K(2449920K)] 
8074480K-7196224K(9440768K), 98.5040880 secs] [Times: user=161.53 sys=6.71, 
real=98.49 secs]
1104.790: [Full GC [PSYoungGen: 655584K-0K(2449920K)] [ParOldGen: 
6540640K-6290292K(6990848K)] 7196224K-6290292K(9440768K) [PSPermGen: 
62247K-6224
7K(131072K)], 610.0023700 secs] [Times: user=1630.17 sys=27.80, real=609.93 
secs]
1841.916: [Full GC [PSYoungGen: 1440256K-0K(2449920K)] [ParOldGen: 
6290292K-6891868K(6990848K)] 7730548K-6891868K(9440768K) [PSPermGen: 
62266K-622
66K(131072K)], 637.4852230 secs] [Times: user=2035.09 sys=36.09, real=637.40 
secs]
2572.012: [Full GC [PSYoungGen: 1440256K-509513K(2449920K)] [ParOldGen: 
6891868K-6990703K(6990848K)] 8332124K-7500217K(9440768K) [PSPermGen: 62275K
-62275K(129024K)], 698.2497860 secs] [Times: user=2261.54 sys=37.63, 
real=698.26 secs]
3326.711: [Full GC


It might seem that the writing file operation is too slow and it’s a 
bottleneck, but then I tried to chenge my algorithm in the following way :


val file1 = sc.parquetFile(“path1”)  //19M rows
val file2 = sc.textFile(“path2”) //12K rows

val bFile2 = sc.broadcast(  file2.collect.groupBy( f2 = f2.filed )  )   
//broadcast of the smaller file as Map()


file1.map( f1 = (   f1, bFile2.value( f1.field  ).head  )   )  //manual join
.map( _toCaseClass()   )
.saveAsParquetFile( “path3” )


in this way the task is fast and ends without problems, so now I’m pretty 
confused.


  *
Join works well if I use count as final action
  *
Parquet write is working well without previous join operation
  *
Parquet write after join never ends and I detected GC problems

Anyone can figure out what it’s happening ?

Thanks


Paolo

Re: Installing a python library along with ec2 cluster

2015-02-08 Thread gen tang

Hi,

You can make a image of ec2 with all the python libraries installed and
create a bash script to export python_path in the /etc/init.d/ directory.
Then you can launch the cluster with this image and ec2.py

Hope this can be helpful

Cheers
Gen


On Sun, Feb 8, 2015 at 9:46 AM, Chengi Liu chengi.liu...@gmail.com wrote:

 Hi,
   I want to install couple of python libraries (pip install
 python_library) which I want to use on pyspark cluster which are developed
 using the ec2 scripts.
 Is there a way to specify these libraries when I am building those ec2
 clusters?
 Whats the best way to install these libraries on each ec2 node?
 Thanks

Re: no space left at worker node

2015-02-08 Thread gen tang

Hi,

I fact, I met this problem before. it is a bug of AWS. Which type of
machine do you use?

If I guess well, you can check the file /etc/fstab. There would be a double
mount of /dev/xvdb.
If yes, you should
1. stop hdfs
2. umount /dev/xvdb at /
3. restart hdfs

Hope this could be helpful.
Cheers
Gen



On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:

 Hi,

 I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker
 node, there is an exception of 'no space left on device' as follows.

 ==
 15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file
 /root/spark/work/app-20150208014557-0003/0/stdout
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:345)
 at

 org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
 at

 org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at
 org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
 ===

 The command df showed the following information at the worker node:

 Filesystem   1K-blocks  Used Available Use% Mounted on
 /dev/xvda1 8256920   8256456 0 100% /
 tmpfs  7752012 0   7752012   0% /dev/shm
 /dev/xvdb 30963708   1729652  27661192   6% /mnt

 Does anybody know how to fix this?  Thanks.


 Ey-Chih Chow



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Mesos coarse mode not working (fine grained does)

2015-02-08 Thread Hans van den Bogert

Hi, 


I’m trying to get coarse mode to work under mesos(0.21.0), I thought this would 
be a trivial change as Mesos was working well in fine-grained mode.

However the mesos tasks fail, I can’t pinpoint where things go wrong. 

This is a mesos stderr log from a slave:

Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz'
I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading 
'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to 
'/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource 
'/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
 into 
'/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151’

Mesos slaves' stdout are empty.


And I can confirm the spark distro is correctly extracted:
$ ls
spark-1.2.0-bin-hadoop2.4  spark-1.2.0-bin-hadoop2.4.tgz  stderr  stdout

The spark-submit log is here:
http://pastebin.com/ms3uZ2BK

Mesos-master
http://pastebin.com/QH2Vn1jX

Mesos-slave
http://pastebin.com/DXFYemix


Can somebody pinpoint me to logs, etc to further investigate this, I’m feeling 
kind of blind.
Furthermore, do the executors on mesos inherit all configs from the spark 
application/submit? E.g. I’ve given my executors 20GB of memory through a 
spark-submit —conf”  parameter. Should these settings also be present in the 
spark-1.2.0-bin-hadoop2.4.tgz distribution’s configs?

If, in order to be helped here, I need to present more logs etc, please let me 
know.

Regards,

Hans van den Bogert
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen


On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,



I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.



==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===



The command df showed the following information at the worker node:



Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt



Does anybody know how to fix this?  Thanks.





Ey-Chih Chow







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

Re: no space left at worker node

2015-02-08 Thread gen tang

Hi,

In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about
double mount. However, there is no information about /mnt2. You should
check whether /dev/sdc is well mounted or not.
The reply of Micheal is good solution about this type of problem. You can
check his site.

Cheers
Gen


On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

 Gen,

 Thanks for your information.  The content of /etc/fstab at the worker node
 (r3.large) is:

 #
 LABEL=/ /   ext4defaults,noatime  1   1
 tmpfs   /dev/shmtmpfs   defaults0   0
 devpts  /dev/ptsdevpts  gid=5,mode=620  0   0
 sysfs   /syssysfs   defaults0   0
 proc/proc   procdefaults0   0
 /dev/sdb/mntauto
  defaults,noatime,nodiratime,comment=cloudconfig 0   0
 /dev/sdc/mnt2   auto
  defaults,noatime,nodiratime,comment=cloudconfig 0   0

 There is no entry of /dev/xvdb.

  Ey-Chih Chow

 --
 Date: Sun, 8 Feb 2015 12:09:37 +0100
 Subject: Re: no space left at worker node
 From: gen.tan...@gmail.com
 To: eyc...@hotmail.com
 CC: user@spark.apache.org


 Hi,

 I fact, I met this problem before. it is a bug of AWS. Which type of
 machine do you use?

 If I guess well, you can check the file /etc/fstab. There would be a
 double mount of /dev/xvdb.
 If yes, you should
 1. stop hdfs
 2. umount /dev/xvdb at /
 3. restart hdfs

 Hope this could be helpful.
 Cheers
 Gen



 On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:

 Hi,

 I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker
 node, there is an exception of 'no space left on device' as follows.

 ==
 15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file
 /root/spark/work/app-20150208014557-0003/0/stdout
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:345)
 at

 org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
 at

 org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at
 org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
 ===

 The command df showed the following information at the worker node:

 Filesystem   1K-blocks  Used Available Use% Mounted on
 /dev/xvda1 8256920   8256456 0 100% /
 tmpfs  7752012 0   7752012   0% /dev/shm
 /dev/xvdb 30963708   1729652  27661192   6% /mnt

 Does anybody know how to fix this?  Thanks.


 Ey-Chih Chow



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.

==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.

Ey-Chih Chow

--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.

-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Thanks Michael.  I didn't edit core-site.xml.  We use the default one.  I only 
saw hdaoop.tmp.dir in core-site.xml, pointing to /mnt/ephemeral-hdfs.  How can 
I edit the config file?
Best regards,
Ey-Chih

Date: Sun, 8 Feb 2015 16:51:32 +
From: m_albert...@yahoo.com
To: gen.tan...@gmail.com; eyc...@hotmail.com
CC: user@spark.apache.org
Subject: Re: no space left at worker node

You might want to take a look in core-site.xml, andsee what is listed as usable 
directories (hadoop.tmp.dir, fs.s3.buffer.dir).
It seems that on S3, the root disk is relatively small (8G), but the config 
files list a mnt directory under it.  Somehow the system doesn't balance 
between the very small space it has under the root disk and the larger disks, 
so the root disk fills up while the others are unused.
At my site, we wrote a boot script to edit these problem out of the config 
before hadoop starts.
-Mike
From: gen tang gen.tan...@gmail.com
 To: ey-chih chow eyc...@hotmail.com 
Cc: user@spark.apache.org user@spark.apache.org 
 Sent: Sunday, February 8, 2015 6:09 AM
 Subject: Re: no space left at worker node
   
Hi,I fact, I met this problem before. it is a bug of AWS. Which type of machine 
do you use?If I guess well, you can check the file /etc/fstab. There would be a 
double mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at 
/ 3. restart hdfsHope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker
node, there is an exception of 'no space left on device' as follows.

==
15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file
/root/spark/work/app-20150208014557-0003/0/stdout
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at
org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
at
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/xvda1 8256920   8256456 0 100% /
tmpfs  7752012 0   7752012   0% /dev/shm
/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.


Ey-Chih Chow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Mesos coarse mode not working (fine grained does)

2015-02-08 Thread Hans van den Bogert

I wasn’t thorough, the complete stderr includes:

g++: /usr/lib64/libaprutil-1.so: No such file or directory
g++: /usr/lib64/libapr-1.so: No such file or directoryn
(including that trailing ’n')

Though I can’t figure out how the process indirection is going from the 
frontend spark application to mesos executors and where this shared library 
error comes from.

Hope someone can shed some light, 

Thanks

On 08 Feb 2015, at 14:15, Hans van den Bogert hansbog...@gmail.com wrote:

 Hi, 
 
 
 I’m trying to get coarse mode to work under mesos(0.21.0), I thought this 
 would be a trivial change as Mesos was working well in fine-grained mode.
 
 However the mesos tasks fail, I can’t pinpoint where things go wrong. 
 
 This is a mesos stderr log from a slave:
 
Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz'
I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading 
 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to 
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource 
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
  into 
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151’
 
 Mesos slaves' stdout are empty.
 
 
 And I can confirm the spark distro is correctly extracted:
$ ls
spark-1.2.0-bin-hadoop2.4  spark-1.2.0-bin-hadoop2.4.tgz  stderr  stdout
 
 The spark-submit log is here:
 http://pastebin.com/ms3uZ2BK
 
 Mesos-master
 http://pastebin.com/QH2Vn1jX
 
 Mesos-slave
 http://pastebin.com/DXFYemix
 
 
 Can somebody pinpoint me to logs, etc to further investigate this, I’m 
 feeling kind of blind.
 Furthermore, do the executors on mesos inherit all configs from the spark 
 application/submit? E.g. I’ve given my executors 20GB of memory through a 
 spark-submit —conf”  parameter. Should these settings also be present in the 
 spark-1.2.0-bin-hadoop2.4.tgz distribution’s configs?
 
 If, in order to be helped here, I need to present more logs etc, please let 
 me know.
 
 Regards,
 
 Hans van den Bogert


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Mesos coarse mode not working (fine grained does)

2015-02-08 Thread Tim Chen

Hi there,

It looks like while trying to launch the executor (or one of the process
like the fetcher to fetch the uris) was failing because of the dependencies
problem you see. Your mesos-slave shouldn't be able to run though, were you
running 0.20.0 slave and upgraded to 0.21.0? We introduced the dependencies
for libapr and libsvn for Mesos 0.21.0.

What's the stdout for the task like?

Tim




On Mon, Feb 9, 2015 at 4:10 AM, Hans van den Bogert hansbog...@gmail.com
wrote:

 I wasn’t thorough, the complete stderr includes:

 g++: /usr/lib64/libaprutil-1.so: No such file or directory
 g++: /usr/lib64/libapr-1.so: No such file or directoryn
 (including that trailing ’n')

 Though I can’t figure out how the process indirection is going from the
 frontend spark application to mesos executors and where this shared library
 error comes from.

 Hope someone can shed some light,

 Thanks

 On 08 Feb 2015, at 14:15, Hans van den Bogert hansbog...@gmail.com
 wrote:

  Hi,
 
 
  I’m trying to get coarse mode to work under mesos(0.21.0), I thought
 this would be a trivial change as Mesos was working well in fine-grained
 mode.
 
  However the mesos tasks fail, I can’t pinpoint where things go wrong.
 
  This is a mesos stderr log from a slave:
 
 Fetching URI 'http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz'
 I0208 12:57:45.415575 25720 fetcher.cpp:126] Downloading '
 http://upperpaste.com/spark-1.2.0-bin-hadoop2.4.tgz' to
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
 I0208 12:58:09.146960 25720 fetcher.cpp:64] Extracted resource
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151/spark-1.2.0-bin-hadoop2.4.tgz'
 into
 '/local/vdbogert/var/lib/mesos//slaves/20150206-110658-16813322-5050-5515-S1/frameworks/20150208-125721-906005770-5050-32371-/executors/0/runs/cb525b32-387c-4698-a27e-8d4213080151’
 
  Mesos slaves' stdout are empty.
 
 
  And I can confirm the spark distro is correctly extracted:
 $ ls
 spark-1.2.0-bin-hadoop2.4  spark-1.2.0-bin-hadoop2.4.tgz  stderr
 stdout
 
  The spark-submit log is here:
  http://pastebin.com/ms3uZ2BK
 
  Mesos-master
  http://pastebin.com/QH2Vn1jX
 
  Mesos-slave
  http://pastebin.com/DXFYemix
 
 
  Can somebody pinpoint me to logs, etc to further investigate this, I’m
 feeling kind of blind.
  Furthermore, do the executors on mesos inherit all configs from the
 spark application/submit? E.g. I’ve given my executors 20GB of memory
 through a spark-submit —conf”  parameter. Should these settings also be
 present in the spark-1.2.0-bin-hadoop2.4.tgz distribution’s configs?
 
  If, in order to be helped here, I need to present more logs etc, please
 let me know.
 
  Regards,
 
  Hans van den Bogert


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark concurrency question

2015-02-08 Thread Sean Owen

I think I have this right:

You will run one executor per application per worker. Generally there
is one worker per machine, and it manages all of the machine's
resources. So if you want one app to use this whole machine you need
to ask for 48G and 24 cores. That's better than splitting up the
resources such that no executor can use more than 4G.

(However with big heaps 32G it can make sense to limit the size of an
executor, so for example, you could configure to run 3 workers per
machine each controlling 8 cores and 16G, and ask for smaller
executors. Still I don't think it would make sense to run 12 workers
per machine here.)

10 tasks (1 per partition) will execute. They generally get assigned
to favor data locality, but here everything's local. If you had 3
executors of 8 cores, I'm not sure if it's guaranteed to balance but
it should be using at least 2 executors, since there are 10 tasks and
8*3=24 slots.

In your initial scenario, I think it may be waiting because the single
worker has all of its cores devoted to your first app's single
executor. You can ask for fewer cores in each spark-shell.

Not sure what you mean about threads. Yes of course threads are used
within one JVM / executor. It's not an executor per partition; it's a
task per partition and 1 executor per application per worker (and
usually 1 worker per machine but not always). One task executes
serially in one thread and as many tasks as slots can run
concurrently, and that's 1 slot per core that the executor is using. I
suppose in theory you could write a function that starts its own
threads too, but that's not generally a good idea or necessary.

Did you read the docs on the site?
http://spark.apache.org/docs/latest/cluster-overview.html
http://spark.apache.org/docs/latest/spark-standalone.html

On Sun, Feb 8, 2015 at 7:18 PM, java8964 java8...@hotmail.com wrote:
 Hi, I have some questions about how the spark run the job concurrently.

 For example, if I setup the Spark on one standalone test box, which has 24
 core and 64G memory. I setup the Worker memory to 48G, and Executor memory
 to 4G, and using spark-shell to run some jobs. Here is something confusing
 me:

 1) Does the above setting mean that I can have up to 12 Executor running in
 this box at same time?
 2) Let's assume that I want to do a line count of one 1280M HDFS file, which
 has 10 blocks as 128M per block. In this case, when the Spark program starts
 to run, will it kick off one executor using 10 threads to read these 10
 blocks hdfs data, or 10 executors to read one block each? Or in other way? I
 read the Apache spark document, so I know that this 1280M HDFS file will be
 split as 10 partitions. But how the executor run them, I am not clear.
 3) In my test case, I started one Spark-shell to run a very expensive job. I
 saw in the Spark web UI, there are 8 stages generated, with 200 to 400 tasks
 in each stage, and the tasks started to run. At this time, I started another
 spark shell to connect to master, and try to run a small spark program. From
 the spark-shell, it shows my new small program is in a wait status for
 resource. Why? And what kind of resources it is waiting for? If it is
 waiting for memory, does this means that there are 12 concurrent tasks
 running in the first program, took 12 * 4G = 48G memory given to the worker,
 so no more resource available? If so, in this case, then one running task is
 one executor?
 4) In MapReduce, the count of map and reducer tasks are the resource used by
 the cluster. My understanding is Spark is using multithread, instead of
 individual JVM processor. In this case, is the Executor using 4G heap to
 generate multithreads? My real question is that if each executor
 corresponding to each RDD partition, or executor could span thread for a RDD
 partition? On the other hand, how the worker decides how many executors to
 be created?

 If there is any online document answering the above questions, please let me
 know. I searched in the Apache Spark site, but couldn't find it.

 Thanks

 Yong

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark concurrency question

2015-02-08 Thread java8964

Hi, I have some questions about how the spark run the job concurrently.
For example, if I setup the Spark on one standalone test box, which has 24 core 
and 64G memory. I setup the Worker memory to 48G, and Executor memory to 4G, 
and using spark-shell to run some jobs. Here is something confusing me:
1) Does the above setting mean that I can have up to 12 Executor running in 
this box at same time?2) Let's assume that I want to do a line count of one 
1280M HDFS file, which has 10 blocks as 128M per block. In this case, when the 
Spark program starts to run, will it kick off one executor using 10 threads to 
read these 10 blocks hdfs data, or 10 executors to read one block each? Or in 
other way? I read the Apache spark document, so I know that this 1280M HDFS 
file will be split as 10 partitions. But how the executor run them, I am not 
clear.3) In my test case, I started one Spark-shell to run a very expensive 
job. I saw in the Spark web UI, there are 8 stages generated, with 200 to 400 
tasks in each stage, and the tasks started to run. At this time, I started 
another spark shell to connect to master, and try to run a small spark program. 
From the spark-shell, it shows my new small program is in a wait status for 
resource. Why? And what kind of resources it is waiting for? If it is waiting 
for memory, does this means that there are 12 concurrent tasks running in the 
first program, took 12 * 4G = 48G memory given to the worker, so no more 
resource available? If so, in this case, then one running task is one 
executor?4) In MapReduce, the count of map and reducer tasks are the resource 
used by the cluster. My understanding is Spark is using multithread, instead of 
individual JVM processor. In this case, is the Executor using 4G heap to 
generate multithreads? My real question is that if each executor corresponding 
to each RDD partition, or executor could span thread for a RDD partition? On 
the other hand, how the worker decides how many executors to be created?
If there is any online document answering the above questions, please let me 
know. I searched in the Apache Spark site, but couldn't find it.
Thanks
Yong

Re: [GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-08 Thread Kyle Ellrott

I changed the

curGraph = curGraph.outerJoinVertices(curMessages)(
  (vid, vertex, message) =
vertex.process(message.getOrElse(List[Message]()), ti)
).cache()

to

curGraph = curGraph.outerJoinVertices(curMessages)(
  (vid, vertex, message) = (vertex,
message.getOrElse(List[Message]()))
).mapVertices( (x,y) = y._1.process( y._2, ti ) ).cache()

So the call to the 'process' method was moved out of the outerJoinVertices
and into a separate mapVertices call, and the problem went away. Now,
'process' is only called once during the correct cycle.
So it would appear that outerJoinVertices caches the closure to be
recalculated if needed again while mapVertices actually caches the derived
values.

Is this a bug or a feature?

Kyle



On Sat, Feb 7, 2015 at 11:44 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:

 I'm trying to setup a simple iterative message/update problem in GraphX
 (spark 1.2.0), but I'm running into issues with the caching and
 re-calculation of data. I'm trying to follow the example found in the
 Pregel implementation of materializing and cacheing messages and graphs and
 then unpersisting them after the next cycle has been done.
 It doesn't seem to be working, because every cycle gets progressively
 slower and it seems as if more and more of the values are being
 re-calculated despite my attempts to cache them.

 The code:
 ```
   var oldMessages : VertexRDD[List[Message]] = null
   var oldGraph : Graph[MyVertex, MyEdge ] = null
   curGraph = curGraph.mapVertices((x, y) = y.init())
   for (i - 0 to cycle_count) {
 val curMessages = curGraph.aggregateMessages[List[Message]](x = {
   //send messages
   .
 },
 (x, y) = {
//collect messages into lists
 val out = x ++ y
 out
   }
 ).cache()
 curMessages.count()
 val ti = i
 oldGraph = curGraph
 curGraph = curGraph.outerJoinVertices(curMessages)(
   (vid, vertex, message) =
 vertex.process(message.getOrElse(List[Message]()), ti)
 ).cache()
 curGraph.vertices.count()
 oldGraph.unpersistVertices(blocking = false)
 oldGraph.edges.unpersist(blocking = false)
 oldGraph = curGraph
 if (oldMessages != null ) {
   oldMessages.unpersist(blocking=false)
 }
 oldMessages = curMessages
   }
 ```

 The MyVertex.process method takes the list of incoming messages, averages
 them and returns a new MyVertex object. I've also set it up to append the
 cycle number (the second argument) into a log file named after the vertex.
 What ends up getting dumped into the log file for every vertex (in the
 exact same pattern) is
 ```
 Cycle: 0
 Cycle: 1
 Cycle: 0
 Cycle: 2
 Cycle: 0
 Cycle: 0
 Cycle: 1
 Cycle: 3
 Cycle: 0
 Cycle: 0
 Cycle: 1
 Cycle: 0
 Cycle: 0
 Cycle: 1
 Cycle: 2
 Cycle: 4
 Cycle: 0
 Cycle: 0
 Cycle: 1
 Cycle: 0
 Cycle: 0
 Cycle: 1
 Cycle: 2
 Cycle: 0
 Cycle: 0
 Cycle: 1
 Cycle: 0
 Cycle: 0
 Cycle: 1
 Cycle: 2
 Cycle: 3
 Cycle: 5
 ```

 Any ideas about what I might be doing wrong for the caching? And how I can
 avoid re-calculating so many of the values.


 Kyle

Re: no space left at worker node

2015-02-08 Thread gen tang

Hi,

I am sorry that I made a mistake. r3.large has only one SSD which has been
mounted in /mnt. Therefore this is no /dev/sdc.
In fact, the problem is that there is no space in the under / directory. So
you should check whether your application write data under this
directory(for instance, save file in file:///).

If not, you can use watch du -sh to during the running time to figure out
which directory is expanding. Normally, only /mnt directory which is
supported by SSD is expanding significantly, because the data of hdfs is
saved here. Then you can find the directory which caused no space problem
and find out the specific reason.

Cheers
Gen



On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

 Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In
 general, the problem shows up when I submit the second or third job.  The
 first job I submit most likely will succeed.

 Ey-Chih Chow

 --
 Date: Sun, 8 Feb 2015 18:18:03 +0100

 Subject: Re: no space left at worker node
 From: gen.tan...@gmail.com
 To: eyc...@hotmail.com
 CC: user@spark.apache.org

 Hi,

 In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about
 double mount. However, there is no information about /mnt2. You should
 check whether /dev/sdc is well mounted or not.
 The reply of Micheal is good solution about this type of problem. You can
 check his site.

 Cheers
 Gen


 On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

 Gen,

 Thanks for your information.  The content of /etc/fstab at the worker node
 (r3.large) is:

 #
 LABEL=/ /   ext4defaults,noatime  1   1
 tmpfs   /dev/shmtmpfs   defaults0   0
 devpts  /dev/ptsdevpts  gid=5,mode=620  0   0
 sysfs   /syssysfs   defaults0   0
 proc/proc   procdefaults0   0
 /dev/sdb/mntauto
  defaults,noatime,nodiratime,comment=cloudconfig 0   0
 /dev/sdc/mnt2   auto
  defaults,noatime,nodiratime,comment=cloudconfig 0   0

 There is no entry of /dev/xvdb.

  Ey-Chih Chow

 --
 Date: Sun, 8 Feb 2015 12:09:37 +0100
 Subject: Re: no space left at worker node
 From: gen.tan...@gmail.com
 To: eyc...@hotmail.com
 CC: user@spark.apache.org


 Hi,

 I fact, I met this problem before. it is a bug of AWS. Which type of
 machine do you use?

 If I guess well, you can check the file /etc/fstab. There would be a
 double mount of /dev/xvdb.
 If yes, you should
 1. stop hdfs
 2. umount /dev/xvdb at /
 3. restart hdfs

 Hope this could be helpful.
 Cheers
 Gen



 On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:

 Hi,

 I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker
 node, there is an exception of 'no space left on device' as follows.

 ==
 15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file
 /root/spark/work/app-20150208014557-0003/0/stdout
 java.io.IOException: No space left on device
 at java.io.FileOutputStream.writeBytes(Native Method)
 at java.io.FileOutputStream.write(FileOutputStream.java:345)
 at

 org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
 at

 org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
 at
 org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
 at

 org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
 ===

 The command df showed the following information at the worker node:

 Filesystem   1K-blocks  Used Available Use% Mounted on
 /dev/xvda1 8256920   8256456 0 100% /
 tmpfs  7752012 0   7752012   0% /dev/shm
 /dev/xvdb 30963708   1729652  27661192   6% /mnt

 Does anybody know how to fix this?  Thanks.


 Ey-Chih Chow



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark concurrency question

2015-02-08 Thread Sean Owen

On Sun, Feb 8, 2015 at 10:26 PM, java8964 java8...@hotmail.com wrote:
 standalone one box environment, if I want to use all 48G memory allocated to
 worker for my application, I should ask 48G memory for the executor in the
 spark shell, right? Because 48G is too big for a JVM heap in normal case, I
 can and should consider to start multi workers in one box, to lower the
 executor memory, but still use all 48G memory.

Yes.

 In the spark document, about the -- cores parameter, the default is all
 available cores, so it means using all available cores in all workers, even
 in the cluster environment? If so, in default case, if one client submit a
 huge job, it will use all the available cores from the cluster for all the
 tasks it generates?

Have a look at how cores work in standalone mode:
http://spark.apache.org/docs/latest/job-scheduling.html

 One thing is still not clear is in the given example I have, if 10 tasks (1
 per partition) will execute, but there is one executor per application, in
 this case, I have the following 2 questions, assuming that the worker memory
 is set to 48G, and executor memory is set to 4G, and I use one spark-shell
 to connect to the master to submit my application:

 1) How many executor will be created on this box (Or even in the cluster it
 it is running in the cluster)? I don't see any spark configuration related
 to set number of executor in spark shell. If it is more than one, how this
 number is calculated?

Again from http://spark.apache.org/docs/latest/job-scheduling.html for
standalone mode the default should be 1 executor per worker, but you
can change that.

 2) Do you mean that one partition (or one task for it) will be run by one
 executor? Is that one executor will run the task sequentially, but job
 concurrency comes from that multi executors could run synchronous, right?

A partition maps to a task, which is computed serially. Tasks are
executed in parallel in an executor, which can execute many tasks at
once. No, parallelism does not (only) come from running many
executors.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen

On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.

==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.

Ey-Chih Chow

--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.

-

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org

Re: Can't access remote Hive table from spark

2015-02-08 Thread guxiaobo1982

Hi Lian,
Will the latest 0.14.0 version of Hive,which is installed by ambari 1.7.0 by 
default, be supported by the next release of Spark?


Regards,




-- Original --
From:  Cheng Lian;lian.cs@gmail.com;
Send time: Friday, Feb 6, 2015 9:02 AM
To: guxiaobo1...@qq.com; user@spark.apache.orguser@spark.apache.org; 

Subject:  Re: Can't access remote Hive table from spark



  
Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0,
 none of other versions are supported.
   
Best,
 Cheng
   
On 1/25/15 12:18 AM, guxiaobo1982 wrote:
   



Hi,
   I built and started a single node standalone Spark 1.2.0 
cluster along with a single node Hive 0.14.0 instance installed by 
Ambari 1.17.0. On the Spark and Hive node I can create and query 
tables inside Hive, and on remote machines I can submit the SparkPi 
example   to the Spark master. But I failed to run the following
   example code :
   
 

public class SparkTest {
 
 public static   void main(String[] args)
 
 {
 
  String appName= This   is a test application;
 
  String master=spark://lix1.bh.com:7077;
 
  
 
  SparkConf conf = new   
SparkConf().setAppName(appName).setMaster(master);
 
  JavaSparkContext sc = new   JavaSparkContext(conf);
 
  
 
  JavaHiveContext sqlCtx = new   
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
 
  //sqlCtx.sql(CREATE   TABLE IF NOT EXISTS src 
(key INT,   value STRING));
 
  //sqlCtx.sql(LOAD   DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src);
 
  //   Queries are expressed in HiveQL.
 
ListRow rows = sqlCtx.sql(FROM src SELECT key, value).collect();
 
System.out.print(I got  + rows.size() +  rows \r\n);
 
  sc.close();}
 
}
 

 
 
Exception in thread main 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src
 
 at   
org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
 
 at   
org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
 
 at   
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)
 
 at 
org.apache.spark.sql.hive.HiveContext$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253)
 
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
 
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
 
 at   scala.Option.getOrElse(Option.scala:120)
 
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)
 
 at   
org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253)
 
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143)
 
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138)
 
 at   
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
 
 at   
org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162)
 
 at   scala.collection.Iterator$anon$11.next(Iterator.scala:328)
 
 at   scala.collection.Iterator$class.foreach(Iterator.scala:727)
 
 at   scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 
 at   
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 
 at   
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 
 at   
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 
 at   
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 
 at   scala.collection.AbstractIterator.to(Iterator.scala:1157)
 
 at   
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 
 at   
scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-08 Thread fightf...@163.com

Hi,
Problem still exists. Any experts would take a look at this? 

Thanks,
Sun.



fightf...@163.com
 
From: fightf...@163.com
Date: 2015-02-06 17:54
To: user; dev
Subject: Sort Shuffle performance issues about using AppendOnlyMap for large 
data sets
Hi, all
Recently we had caught performance issues when using spark 1.2.0 to read data 
from hbase and do some summary work.
Our scenario means to : read large data sets from hbase (maybe 100G+ file) , 
form hbaseRDD, transform to schemardd, 
groupby and aggregate the data while got fewer new summary data sets, loading 
data into hbase (phoenix).

Our major issue lead to : aggregate large datasets to get summary data sets 
would consume too long time (1 hour +) , while that
should be supposed not so bad performance. We got the dump file attached and 
stacktrace from jstack like the following:

From the stacktrace and dump file we can identify that processing large 
datasets would cause frequent AppendOnlyMap growing, and 
leading to huge map entrysize. We had referenced the source code of 
org.apache.spark.util.collection.AppendOnlyMap and found that 
the map had been initialized with capacity of 64. That would be too small for 
our use case. 

So the question is : Does anyone had encounted such issues before? How did that 
be resolved? I cannot find any jira issues for such problems and 
if someone had seen, please kindly let us know.

More specified solution would goes to : Does any possibility exists for user 
defining the map capacity releatively in spark? If so, please
tell how to achieve that. 

Best Thanks,
Sun.

   Thread 22432: (state = IN_JAVA)
- org.apache.spark.util.collection.AppendOnlyMap.growTable() @bci=87, line=224 
(Compiled frame; information may be imprecise)
- org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable() 
@bci=1, line=38 (Interpreted frame)
- org.apache.spark.util.collection.AppendOnlyMap.incrementSize() @bci=22, 
line=198 (Compiled frame)
- org.apache.spark.util.collection.AppendOnlyMap.changeValue(java.lang.Object, 
scala.Function2) @bci=201, line=145 (Compiled frame)
- 
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(java.lang.Object,
 scala.Function2) @bci=3, line=32 (Compiled frame)
- 
org.apache.spark.util.collection.ExternalSorter.insertAll(scala.collection.Iterator)
 @bci=141, line=205 (Compiled frame)
- 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(scala.collection.Iterator)
 @bci=74, line=58 (Interpreted frame)
- 
org.apache.spark.scheduler.ShuffleMapTask.runTask(org.apache.spark.TaskContext) 
@bci=169, line=68 (Interpreted frame)
- 
org.apache.spark.scheduler.ShuffleMapTask.runTask(org.apache.spark.TaskContext) 
@bci=2, line=41 (Interpreted frame)
- org.apache.spark.scheduler.Task.run(long) @bci=77, line=56 (Interpreted frame)
- org.apache.spark.executor.Executor$TaskRunner.run() @bci=310, line=196 
(Interpreted frame)
- 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
 @bci=95, line=1145 (Interpreted frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 
(Interpreted frame)
- java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)


Thread 22431: (state = IN_JAVA)
- org.apache.spark.util.collection.AppendOnlyMap.growTable() @bci=87, line=224 
(Compiled frame; information may be imprecise)
- org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable() 
@bci=1, line=38 (Interpreted frame)
- org.apache.spark.util.collection.AppendOnlyMap.incrementSize() @bci=22, 
line=198 (Compiled frame)
- org.apache.spark.util.collection.AppendOnlyMap.changeValue(java.lang.Object, 
scala.Function2) @bci=201, line=145 (Compiled frame)
- 
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(java.lang.Object,
 scala.Function2) @bci=3, line=32 (Compiled frame)
- 
org.apache.spark.util.collection.ExternalSorter.insertAll(scala.collection.Iterator)
 @bci=141, line=205 (Compiled frame)
- 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(scala.collection.Iterator)
 @bci=74, line=58 (Interpreted frame)
- 
org.apache.spark.scheduler.ShuffleMapTask.runTask(org.apache.spark.TaskContext) 
@bci=169, line=68 (Interpreted frame)
- 
org.apache.spark.scheduler.ShuffleMapTask.runTask(org.apache.spark.TaskContext) 
@bci=2, line=41 (Interpreted frame)
- org.apache.spark.scheduler.Task.run(long) @bci=77, line=56 (Interpreted frame)
- org.apache.spark.executor.Executor$TaskRunner.run() @bci=310, line=196 
(Interpreted frame)
- 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
 @bci=95, line=1145 (Interpreted frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 
(Interpreted frame)
- java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)


fightf...@163.com
1 attachments
dump.png(42K) download preview

Error when running example (pi.py)

2015-02-08 Thread Ashish Kumar

Traceback (most recent call last):
  File pi.py, line 29, in module
sc = SparkContext(appName=PythonPi)
  File
/home/ashish/Downloads/spark-1.1.0-bin-hadoop2.4/python/pyspark/context.py,
line 104, in __init__
SparkContext._ensure_initialized(self, gateway=gateway)
  File
/home/ashish/Downloads/spark-1.1.0-bin-hadoop2.4/python/pyspark/context.py,
line 211, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
  File
/home/ashish/Downloads/spark-1.1.0-bin-hadoop2.4/python/pyspark/java_gateway.py,
line 48, in launch_gateway
proc = Popen(command, stdout=PIPE, stdin=PIPE, preexec_fn=preexec_func)
  File /usr/lib/python2.7/subprocess.py, line 710, in __init__
errread, errwrite)
  File /usr/lib/python2.7/subprocess.py, line 1327, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied

Re: WebUI on yarn through ssh tunnel affected by AmIpfilter

2015-02-08 Thread Akhil Das

Just to add why tunneling is not a good practice sometime:

There could be some other ports/apps depeneding on other processes running
on different ports. Lets say a web app running on port 8080 pulling info
from other processes through rest api which will fail here since you only
tunnel for 8080 and hence the ui/data will look ugly.
On 9 Feb 2015 11:57, Akhil Das ak...@sigmoidanalytics.com wrote:

Just make sure all ports (0-65535) are accessible across your cluster. And
you may also want to open these ports for your IP address instead of
tunneling:

8080, 8081, 18080, 1, 50030, 50070, 60070, 4040-4045

Thanks
Best Regards

On Sat, Feb 7, 2015 at 10:38 AM, yangqch davidyang...@gmail.com wrote:

Hi folks,

I am new to spark. I just get spark 1.2 to run on emr ami 3.3.1 (hadoop
2.4).
I ssh to emr master node and submit the job or start the shell. Everything
runs well except the webUI.

In order to see the UI, I used ssh tunnel which forward my dev machine
port
to emr master node webUI port.

When I open the webUI, at the very beginning of the application (during
the
spark launch time), the webUI is as nice as shown in many spark docs.
However, once the YARN AmIpfilter started to work, the webUI becomes very
ugly. No pictures can be displayed, only text can be shown (just like you
view it in lynx). Meanwhile, in spark shell, it pops up
amfilter.AmIpFilter
(AmIpFilter.java:doFilter(157)) - Could not find proxy-user cookie, so
user
will not be set”.

Can anyone give me some help? Thank you!

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/WebUI-on-yarn-through-ssh-tunnel-affected-by-AmIpfilter-tp21540.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[MLlib] Performance issues when building GBM models

2015-02-08 Thread Christopher Thom

Hi All,

I wonder if anyone else has some experience building a Gradient Boosted Trees 
model using spark/mllib? I have noticed when building decent-size models that 
the process slows down over time. We observe that the time to build tree n is 
approximately a constant time longer than the time to build tree n-1 i.e. t(n) 
= t(n-1) + const. The implication is that the total build time goes as 
something like N^2, where N is the total number of trees. I would expect that 
the algorithm should be approximately linear in total time (i.e. each boosting 
iteration takes roughly the same time to complete).

So I have a couple of questions:
1. Is this behaviour expected, or consistent with what others are seeing?
2. Does anyone know if there a tuning parameters (e.g. in the boosting 
strategy, or tree stategy) that may be impacting this?

All aspects of the build seem to slow down as I go. Here's a random example 
culled from the logs, from the beginning and end of the model build:

15/02/09 17:22:11 INFO scheduler.DAGScheduler: Job 42 finished: count at 
DecisionTreeMetadata.scala:111, took 0.077957 s

15/02/09 19:44:01 INFO scheduler.DAGScheduler: Job 7954 finished: count at 
DecisionTreeMetadata.scala:111, took 5.495166 s

Any thoughts or advice, or even suggestions on where to dig for more info would 
be welcome.

thanks
chris

Christopher Thom

QUANTIUM
Level 25, 8 Chifley, 8-12 Chifley Square
Sydney NSW 2000

T: +61 2 8222 3577
F: +61 2 9292 6444

W: quantium.com.auwww.quantium.com.au



linkedin.com/company/quantiumwww.linkedin.com/company/quantium

facebook.com/QuantiumAustraliawww.facebook.com/QuantiumAustralia

twitter.com/QuantiumAUwww.twitter.com/QuantiumAU


The contents of this email, including attachments, may be confidential 
information. If you are not the intended recipient, any use, disclosure or 
copying of the information is unauthorised. If you have received this email in 
error, we would be grateful if you would notify us immediately by email reply, 
phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the message from 
your system.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

Is there any way we can disable Spark copying the jar file to the corresponding 
directory.  I have a fat jar and is already copied to worker nodes using the 
command copydir.  Why Spark needs to save the jar to ./spark/work/appid each 
time a job get started?
Ey-Chih Chow 

Date: Sun, 8 Feb 2015 20:09:32 -0800
Subject: Re: no space left at worker node
From: 2dot7kel...@gmail.com
To: eyc...@hotmail.com
CC: gen.tan...@gmail.com; user@spark.apache.org

I guess you may set the parameters below to clean the directories:
spark.worker.cleanup.enabledspark.worker.cleanup.intervalspark.worker.cleanup.appDataTtl
They are described here: 
http://spark.apache.org/docs/1.2.0/spark-standalone.html
Kelvin
On Sun, Feb 8, 2015 at 5:15 PM, ey-chih chow eyc...@hotmail.com wrote:

I found the problem is, for each application, the Spark worker node saves the 
corresponding std output and std err under ./spark/work/appid, where appid is 
the id of the application.  If I ran several applications in a row, it will out 
of space.  In my case, the disk usage under ./spark/work/ is as follows:
1689784 ./app-20150208203033-0002/01689788  ./app-20150208203033-000240324  
./driver-20150208180505-00011691400 ./app-20150208180509-0001/01691404  
./app-20150208180509-000140316  ./driver-20150208203030-000240320   
./driver-20150208173156-1649876 ./app-20150208173200-/01649880  
./app-20150208173200-5152036.
Any suggestion how to resolve it?  Thanks.
Ey-Chih ChowFrom: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 15:25:43 -0800

By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800

Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen

On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception

Re: Installing a python library along with ec2 cluster

2015-02-08 Thread Chengi Liu

Hi I am very new both in spark and aws stuff..
Say, I want to install pandas on ec2.. (pip install pandas)
How do I create the image and the above library which would be used from
pyspark.
Thanks

On Sun, Feb 8, 2015 at 3:03 AM, gen tang gen.tan...@gmail.com wrote:

 Hi,

 You can make a image of ec2 with all the python libraries installed and
 create a bash script to export python_path in the /etc/init.d/ directory.
 Then you can launch the cluster with this image and ec2.py

 Hope this can be helpful

 Cheers
 Gen


 On Sun, Feb 8, 2015 at 9:46 AM, Chengi Liu chengi.liu...@gmail.com
 wrote:

 Hi,
   I want to install couple of python libraries (pip install
 python_library) which I want to use on pyspark cluster which are developed
 using the ec2 scripts.
 Is there a way to specify these libraries when I am building those ec2
 clusters?
 Whats the best way to install these libraries on each ec2 node?
 Thanks

Re: Installing a python library along with ec2 cluster

2015-02-08 Thread Akhil Das

You can basically add one function call to install the stuffs you want. If
you look at the spark-ec2 script, there's a function which does all the
setup named: setup_cluster(..)
https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L625. Now,
if you want to install a python library ( assuming pip is already
installed), you can add one more line in the above function like:

ssh(master, opts, pip install pandas)

This will install it on the master node, you have slave_nodes variable
which has all info of slave machines
. You can iterate through it and do the same.


Thanks
Best Regards

On Sun, Feb 8, 2015 at 2:16 PM, Chengi Liu chengi.liu...@gmail.com wrote:

 Hi,
   I want to install couple of python libraries (pip install
 python_library) which I want to use on pyspark cluster which are developed
 using the ec2 scripts.
 Is there a way to specify these libraries when I am building those ec2
 clusters?
 Whats the best way to install these libraries on each ec2 node?
 Thanks

Re: no space left at worker node

2015-02-08 Thread Kelvin Chu

Maybe, try with local: under the heading of Advanced Dependency
Management here:
https://spark.apache.org/docs/1.1.0/submitting-applications.html

It seems this is what you want. Hope this help.

Kelvin

On Sun, Feb 8, 2015 at 9:13 PM, ey-chih chow eyc...@hotmail.com wrote:

 Is there any way we can disable Spark copying the jar file to the
 corresponding directory.  I have a fat jar and is already copied to worker
 nodes using the command copydir.  Why Spark needs to save the jar to
 ./spark/work/appid each time a job get started?

 Ey-Chih Chow

 --
 Date: Sun, 8 Feb 2015 20:09:32 -0800
 Subject: Re: no space left at worker node
 From: 2dot7kel...@gmail.com
 To: eyc...@hotmail.com
 CC: gen.tan...@gmail.com; user@spark.apache.org


 I guess you may set the parameters below to clean the directories:

 spark.worker.cleanup.enabled
 spark.worker.cleanup.interval
 spark.worker.cleanup.appDataTtl

 They are described here:
 http://spark.apache.org/docs/1.2.0/spark-standalone.html

 Kelvin

 On Sun, Feb 8, 2015 at 5:15 PM, ey-chih chow eyc...@hotmail.com wrote:

 I found the problem is, for each application, the Spark worker node saves
 the corresponding std output and std err under ./spark/work/appid, where
 appid is the id of the application.  If I ran several applications in a
 row, it will out of space.  In my case, the disk usage under ./spark/work/
 is as follows:

 1689784 ./app-20150208203033-0002/0
 1689788 ./app-20150208203033-0002
 40324 ./driver-20150208180505-0001
 1691400 ./app-20150208180509-0001/0
 1691404 ./app-20150208180509-0001
 40316 ./driver-20150208203030-0002
 40320 ./driver-20150208173156-
 1649876 ./app-20150208173200-/0
 1649880 ./app-20150208173200-
 5152036 .

 Any suggestion how to resolve it?  Thanks.

 Ey-Chih Chow
 --
 From: eyc...@hotmail.com
 To: gen.tan...@gmail.com
 CC: user@spark.apache.org
 Subject: RE: no space left at worker node
 Date: Sun, 8 Feb 2015 15:25:43 -0800


 By this way, the input and output paths of the job are all in s3.  I did
 not use paths of hdfs as input or output.

 Best regards,

 Ey-Chih Chow

 --
 From: eyc...@hotmail.com
 To: gen.tan...@gmail.com
 CC: user@spark.apache.org
 Subject: RE: no space left at worker node
 Date: Sun, 8 Feb 2015 14:57:15 -0800

 Hi Gen,

 Thanks.  I save my logs in a file under /var/log.  This is the only place
 to save data.  Will the problem go away if I use a better machine?

 Best regards,

 Ey-Chih Chow

 --
 Date: Sun, 8 Feb 2015 23:32:27 +0100
 Subject: Re: no space left at worker node
 From: gen.tan...@gmail.com
 To: eyc...@hotmail.com
 CC: user@spark.apache.org

 Hi,

 I am sorry that I made a mistake. r3.large has only one SSD which has been
 mounted in /mnt. Therefore this is no /dev/sdc.
 In fact, the problem is that there is no space in the under / directory.
 So you should check whether your application write data under this
 directory(for instance, save file in file:///).

 If not, you can use watch du -sh to during the running time to figure out
 which directory is expanding. Normally, only /mnt directory which is
 supported by SSD is expanding significantly, because the data of hdfs is
 saved here. Then you can find the directory which caused no space problem
 and find out the specific reason.

 Cheers
 Gen



 On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

 Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In
 general, the problem shows up when I submit the second or third job.  The
 first job I submit most likely will succeed.

 Ey-Chih Chow

 --
 Date: Sun, 8 Feb 2015 18:18:03 +0100

 Subject: Re: no space left at worker node
 From: gen.tan...@gmail.com
 To: eyc...@hotmail.com
 CC: user@spark.apache.org

 Hi,

 In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about
 double mount. However, there is no information about /mnt2. You should
 check whether /dev/sdc is well mounted or not.
 The reply of Micheal is good solution about this type of problem. You can
 check his site.

 Cheers
 Gen


 On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

 Gen,

 Thanks for your information.  The content of /etc/fstab at the worker node
 (r3.large) is:

 #
 LABEL=/ /   ext4defaults,noatime  1   1
 tmpfs   /dev/shmtmpfs   defaults0   0
 devpts  /dev/ptsdevpts  gid=5,mode=620  0   0
 sysfs   /syssysfs   defaults0   0
 proc/proc   procdefaults0   0
 /dev/sdb/mntauto
  defaults,noatime,nodiratime,comment=cloudconfig 0   0
 /dev/sdc/mnt2   auto
  defaults,noatime,nodiratime,comment=cloudconfig 0   0

 There is no entry of /dev/xvdb.

  Ey-Chih Chow

 --
 Date: Sun, 8 Feb 2015 12:09:37 +0100
 Subject: Re: no space left at worker node
 From:

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

I found the problem is, for each application, the Spark worker node saves the 
corresponding std output and std err under ./spark/work/appid, where appid is 
the id of the application.  If I ran several applications in a row, it will out 
of space.  In my case, the disk usage under ./spark/work/ is as follows:
1689784 ./app-20150208203033-0002/01689788  ./app-20150208203033-000240324  
./driver-20150208180505-00011691400 ./app-20150208180509-0001/01691404  
./app-20150208180509-000140316  ./driver-20150208203030-000240320   
./driver-20150208173156-1649876 ./app-20150208173200-/01649880  
./app-20150208173200-5152036.
Any suggestion how to resolve it?  Thanks.
Ey-Chih ChowFrom: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 15:25:43 -0800




By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800




Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen


On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:



Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:



Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen


On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,



I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.



==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

RE: no space left at worker node

2015-02-08 Thread ey-chih chow

By this way, the input and output paths of the job are all in s3.  I did not 
use paths of hdfs as input or output.
Best regards,
Ey-Chih Chow

From: eyc...@hotmail.com
To: gen.tan...@gmail.com
CC: user@spark.apache.org
Subject: RE: no space left at worker node
Date: Sun, 8 Feb 2015 14:57:15 -0800

Hi Gen,
Thanks.  I save my logs in a file under /var/log.  This is the only place to 
save data.  Will the problem go away if I use a better machine?
Best regards,
Ey-Chih Chow

Date: Sun, 8 Feb 2015 23:32:27 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I am sorry that I made a mistake. r3.large has only one SSD which has been 
mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that 
there is no space in the under / directory. So you should check whether your 
application write data under this directory(for instance, save file in 
file:///). 
If not, you can use watch du -sh to during the running time to figure out which 
directory is expanding. Normally, only /mnt directory which is supported by SSD 
is expanding significantly, because the data of hdfs is saved here. Then you 
can find the directory which caused no space problem and find out the specific 
reason.
CheersGen

On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow eyc...@hotmail.com wrote:

Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In general, 
the problem shows up when I submit the second or third job.  The first job I 
submit most likely will succeed.
Ey-Chih Chow

Date: Sun, 8 Feb 2015 18:18:03 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double 
mount. However, there is no information about /mnt2. You should check whether 
/dev/sdc is well mounted or not.The reply of Micheal is good solution about 
this type of problem. You can check his site.
CheersGen

On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow eyc...@hotmail.com wrote:

Gen,
Thanks for your information.  The content of /etc/fstab at the worker node 
(r3.large) is:
#LABEL=/ /   ext4defaults,noatime  1   1tmpfs   /dev/shm
tmpfs   defaults0   0devpts  /dev/ptsdevpts  gid=5,mode=620  0  
 0sysfs   /syssysfs   defaults0   0proc/proc   
procdefaults0   0/dev/sdb/mntauto
defaults,noatime,nodiratime,comment=cloudconfig 0   0/dev/sdc/mnt2  
 autodefaults,noatime,nodiratime,comment=cloudconfig 0   0
There is no entry of /dev/xvdb.
 Ey-Chih Chow
Date: Sun, 8 Feb 2015 12:09:37 +0100
Subject: Re: no space left at worker node
From: gen.tan...@gmail.com
To: eyc...@hotmail.com
CC: user@spark.apache.org

Hi,
I fact, I met this problem before. it is a bug of AWS. Which type of machine do 
you use?
If I guess well, you can check the file /etc/fstab. There would be a double 
mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. 
restart hdfs
Hope this could be helpful.CheersGen

On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow eyc...@hotmail.com wrote:
Hi,

I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker

node, there is an exception of 'no space left on device' as follows.

==

15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file

/root/spark/work/app-20150208014557-0003/0/stdout

java.io.IOException: No space left on device

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at

org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)

at

org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at

org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)

at

org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

===

The command df showed the following information at the worker node:

Filesystem   1K-blocks  Used Available Use% Mounted on

/dev/xvda1 8256920   8256456 0 100% /

tmpfs  7752012 0   7752012   0% /dev/shm

/dev/xvdb 30963708   1729652  27661192   6% /mnt

Does anybody know how to fix this?  Thanks.

Ey-Chih Chow

--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html

R: spark 1.2 writing on parquet after a join never ends - GC problems

Re: Installing a python library along with ec2 cluster

Re: no space left at worker node

Mesos coarse mode not working (fine grained does)

RE: no space left at worker node

Re: no space left at worker node

RE: no space left at worker node

RE: no space left at worker node

Re: Mesos coarse mode not working (fine grained does)

Re: Mesos coarse mode not working (fine grained does)

Re: Spark concurrency question

Spark concurrency question

Re: [GraphX] Excessive value recalculations during aggregateMessages cycles

Re: no space left at worker node

Re: Spark concurrency question

RE: no space left at worker node

Re: Can't access remote Hive table from spark

Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

Error when running example (pi.py)

Re: WebUI on yarn through ssh tunnel affected by AmIpfilter

[MLlib] Performance issues when building GBM models

RE: no space left at worker node

Re: Installing a python library along with ec2 cluster

Re: Installing a python library along with ec2 cluster

Re: no space left at worker node

RE: no space left at worker node

RE: no space left at worker node

27 matches

Site Navigation

Mail list logo

Footer information