[jira] [Commented] (SPARK-12239) SparkR - Not distributing SparkR module in YARN

Sen Fang (JIRA) Sat, 26 Dec 2015 19:46:52 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072029#comment-15072029
 ]


Sen Fang commented on SPARK-12239:
----------------------------------

I recently also got hit by this issue and I would think we can make some 
improvement to make this more robust. This came up exactly when we attempt to 
use SparkR in RStudio/RStudio Server. 

[~sunrui] The workaround you suggest absolutely works but I don't see it in the 
page you cited. Has that page been changed?

[~shivaram] I believe the problem is because {{sparkR.init}} always launches in 
local mode first, e.g:
{code}
Launching java with spark-submit command spark-submit   sparkr-shell 
/var/folders/yw/mfqkln8172l93g6yfnt2k2zw0000gp/T//RtmpCwXLoF/backend_port432252948808
 
{code}
It currently kind of works for YARN, as documented for {{sparkR.init}}:
{code}
sc <- sparkR.init("yarn-client", "SparkR", "/home/spark", ...
{code}
only because the actual Spark context is initiated by this call
https://github.com/apache/spark/blob/v1.6.0-rc4/R/pkg/R/sparkR.R#L212
However this is already after the deploy step, that you cited above, where 
Spark determines the cluster manager via spark-submit arguments.
https://github.com/apache/spark/blob/835a79d78ee879a3c36dde85e5b3591243bf3957/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L228
Therefore you end up with a functional YARN based SparkContext without 
sparkr.zip.

Shouldn't the fix be as easy append the actually master in the spark-submit 
command we build inside R? Or am I missing something?



> SparkR - Not distributing SparkR module in YARN
> ------------------------------------------------
>
>                 Key: SPARK-12239
>                 URL: https://issues.apache.org/jira/browse/SPARK-12239
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR, YARN
>    Affects Versions: 1.5.2, 1.5.3
>            Reporter: Sebastian YEPES FERNANDEZ
>            Priority: Critical
>
> Hello,
> I am trying to use the SparkR in a YARN environment and I have encountered 
> the following problem:
> Every thing work correctly when using bin/sparkR, but if I try running the 
> same jobs using sparkR directly through R it does not work.
> I have managed to track down what is causing the problem, when sparkR is 
> launched through R the "SparkR" module is not distributed to the worker nodes.
> I have tried working around this issue using the setting 
> "spark.yarn.dist.archives", but it does not work as it deploys the 
> file/extracted folder with the extension ".zip" and workers are actually 
> looking for a folder with the name "sparkr"
> Is there currently any way to make this work?
> {code}
> # spark-defaults.conf
> spark.yarn.dist.archives                     /opt/apps/spark/R/lib/sparkr.zip
> # R
> library(SparkR, lib.loc="/opt/apps/spark/R/lib/")
> sc <- sparkR.init(appName="SparkR", master="yarn-client", 
> sparkEnvir=list(spark.executor.instances="1"))
> sqlContext <- sparkRSQL.init(sc)
> df <- createDataFrame(sqlContext, faithful)
> head(df)
> 15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 
> fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>         at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
> {code}
> Container stderr:
> {code}
> 15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as 
> values in memory (estimated size 8.7 KB, free 530.0 MB)
> 15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file 
> '/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_000002/sparkr/SparkR/worker/daemon.R':
>  No such file or directory
> 15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.net.SocketTimeoutException: Accept timed out
>       at java.net.PlainSocketImpl.socketAccept(Native Method)
>       at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
>       at java.net.ServerSocket.implAccept(ServerSocket.java:545)
>       at java.net.ServerSocket.accept(ServerSocket.java:513)
>       at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426)
> {code}
> Worker Node that runned the Container:
> {code}
> # ls -la 
> /hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_000002
> total 71M
> drwx--x--- 3 yarn hadoop 4.0K Dec  9 09:04 .
> drwx--x--- 7 yarn hadoop 4.0K Dec  9 09:04 ..
> -rw-r--r-- 1 yarn hadoop  110 Dec  9 09:03 container_tokens
> -rw-r--r-- 1 yarn hadoop   12 Dec  9 09:03 .container_tokens.crc
> -rwx------ 1 yarn hadoop  736 Dec  9 09:03 
> default_container_executor_session.sh
> -rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 
> .default_container_executor_session.sh.crc
> -rwx------ 1 yarn hadoop  790 Dec  9 09:03 default_container_executor.sh
> -rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 .default_container_executor.sh.crc
> -rwxr-xr-x 1 yarn hadoop  61K Dec  9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar
> -rwxr-xr-x 1 yarn hadoop 317K Dec  9 09:04 kafka-clients-0.8.2.2.jar
> -rwx------ 1 yarn hadoop 6.0K Dec  9 09:03 launch_container.sh
> -rw-r--r-- 1 yarn hadoop   56 Dec  9 09:03 .launch_container.sh.crc
> -rwxr-xr-x 1 yarn hadoop 2.2M Dec  9 09:04 
> spark-cassandra-connector_2.10-1.5.0-M3.jar
> -rwxr-xr-x 1 yarn hadoop 7.1M Dec  9 09:04 spark-csv-assembly-1.3.0.jar
> lrwxrwxrwx 1 yarn hadoop  119 Dec  9 09:03 __spark__.jar -> 
> /hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar
> lrwxrwxrwx 1 yarn hadoop   84 Dec  9 09:03 sparkr.zip -> 
> /hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip
> -rwxr-xr-x 1 yarn hadoop 1.8M Dec  9 09:04 
> spark-streaming_2.10-1.5.3-SNAPSHOT.jar
> -rwxr-xr-x 1 yarn hadoop  11M Dec  9 09:04 
> spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar
> -rwxr-xr-x 1 yarn hadoop  48M Dec  9 09:04 
> sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar
> drwx--x--- 2 yarn hadoop   46 Dec  9 09:04 tmp
> {code}
> *Working case:*
> {code}
> # sparkR --master yarn-client --num-executors 1
> df <- createDataFrame(sqlContext, faithful)
> head(df)
>   eruptions waiting
> 1     3.600      79
> 2     1.800      54
> 3     3.333      74
> 4     2.283      62
> 5     4.533      85
> 6     2.883      55
> {code}
> Worker Node that runned the Container:
> {code}
> # ls -la 
> /hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1170/container_e44_1445706872927_1170_01_000002/
> total 71M
> drwx--x--- 3 yarn hadoop 4.0K Dec  9 09:14 .
> drwx--x--- 6 yarn hadoop 4.0K Dec  9 09:14 ..
> -rw-r--r-- 1 yarn hadoop  110 Dec  9 09:14 container_tokens
> -rw-r--r-- 1 yarn hadoop   12 Dec  9 09:14 .container_tokens.crc
> -rwx------ 1 yarn hadoop  736 Dec  9 09:14 
> default_container_executor_session.sh
> -rw-r--r-- 1 yarn hadoop   16 Dec  9 09:14 
> .default_container_executor_session.sh.crc
> -rwx------ 1 yarn hadoop  790 Dec  9 09:14 default_container_executor.sh
> -rw-r--r-- 1 yarn hadoop   16 Dec  9 09:14 .default_container_executor.sh.crc
> -rwxr-xr-x 1 yarn hadoop  61K Dec  9 09:14 hadoop-lzo-0.6.0.2.3.2.0-2950.jar
> -rwxr-xr-x 1 yarn hadoop 317K Dec  9 09:14 kafka-clients-0.8.2.2.jar
> -rwx------ 1 yarn hadoop 6.3K Dec  9 09:14 launch_container.sh
> -rw-r--r-- 1 yarn hadoop   60 Dec  9 09:14 .launch_container.sh.crc
> -rwxr-xr-x 1 yarn hadoop 2.2M Dec  9 09:14 
> spark-cassandra-connector_2.10-1.5.0-M3.jar
> -rwxr-xr-x 1 yarn hadoop 7.1M Dec  9 09:14 spark-csv-assembly-1.3.0.jar
> lrwxrwxrwx 1 yarn hadoop  119 Dec  9 09:14 __spark__.jar -> 
> /hadoop/hdfs/disk05/hadoop/yarn/local/usercache/spark/filecache/368/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar
> lrwxrwxrwx 1 yarn hadoop   84 Dec  9 09:14 sparkr -> 
> /hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/filecache/367/sparkr.zip
> -rwxr-xr-x 1 yarn hadoop 1.8M Dec  9 09:14 
> spark-streaming_2.10-1.5.3-SNAPSHOT.jar
> -rwxr-xr-x 1 yarn hadoop  11M Dec  9 09:14 
> spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar
> -rwxr-xr-x 1 yarn hadoop  48M Dec  9 09:14 
> sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar
> drwx--x--- 2 yarn hadoop   46 Dec  9 09:14 tmp
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-12239) ​SparkR - Not distributing SparkR module in YARN

Reply via email to

[jira] [Commented] (SPARK-12239) SparkR - Not distributing SparkR module in YARN