[ 
https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440090#comment-15440090
 ] 

Arihanth Jain commented on SPARK-13525:
---------------------------------------

[~sunrui] I have tried "spark.sparkr.use.daemon" to false with no luck. Now, 
dealing with this by creating R cluster using base package "parallel" and 
makePSOCKcluster function. I believe this gets closer to finding:

By passing nodes Hostname the workers fail with following error and hangs on it 
forever

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The RSA host key for test02.servers.jiffybox.net has changed,
and the key for the corresponding IP address 134.xxx.xx.xxx
is unchanged. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
Offending key for IP in /root/.ssh/known_hosts:10
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
<RSA Key>.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending key in /root/.ssh/known_hosts:9
RSA host key for test02.servers.jiffybox.net has changed and you have requested 
strict checking.
Host key verification failed.


The same works fine and all workers are started when passing nodes IP address 
instead of Hostname.

starting worker pid=32407 on master.jiffybox.net:11575 at 22:18:50.050
starting worker pid=3523 on master.jiffybox.net:11575 at 22:18:50.464
starting worker pid=2583 on master.jiffybox.net:11575 at 22:18:50.885
starting worker pid=5227 on master.jiffybox.net:11575 at 22:18:51.294

--------------

The above "DNS SPOOFING" issue was simply resolved by removing the matching 
entries from .ssh/known_hosts and recreating them for all nodes "ssh 
root@hostname". This fixed the previous issue and was able to able to create 
socket cluster with 4 nodes (now at port 11977).


starting worker pid=6804 on master.jiffybox.net:11977 at 23:59:23.245
starting worker pid=10257 on master.jiffybox.net:11977 at 23:59:23.668
starting worker pid=9776 on master.jiffybox.net:11977 at 23:59:24.107
starting worker pid=12073 on master.jiffybox.net:11977 at 23:59:24.540

note: Neither the path to Rscript not any port number was specified.

--------------

Unfortunately, this did not resolve the problem with SparkR. It fails with 
existing issue "java.net.SocketTimeoutException: Accept timed out".


> SparkR: java.net.SocketTimeoutException: Accept timed out when running any 
> dataframe function
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13525
>                 URL: https://issues.apache.org/jira/browse/SPARK-13525
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>            Reporter: Shubhanshu Mishra
>              Labels: sparkr
>
> I am following the code steps from this example:
> https://spark.apache.org/docs/1.6.0/sparkr.html
> There are multiple issues: 
> 1. The head and summary and filter methods are not overridden by spark. Hence 
> I need to call them using `SparkR::` namespace.
> 2. When I try to execute the following, I get errors:
> {code}
> $> $R_HOME/bin/R
> R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
> Copyright (C) 2015 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>   Natural language support but running in an English locale
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
> Welcome at Fri Feb 26 16:19:35 2016 
> Attaching package: ‘SparkR’
> The following objects are masked from ‘package:base’:
>     colnames, colnames<-, drop, intersect, rank, rbind, sample, subset,
>     summary, transform
> Launching java with spark-submit command 
> /content/smishra8/SOFTWARE/spark/bin/spark-submit   --driver-memory "50g" 
> sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b 
> > df <- createDataFrame(sqlContext, iris)
> Warning messages:
> 1: In FUN(X[[i]], ...) :
>   Use Sepal_Length instead of Sepal.Length  as column name
> 2: In FUN(X[[i]], ...) :
>   Use Sepal_Width instead of Sepal.Width  as column name
> 3: In FUN(X[[i]], ...) :
>   Use Petal_Length instead of Petal.Length  as column name
> 4: In FUN(X[[i]], ...) :
>   Use Petal_Width instead of Petal.Width  as column name
> > training <- filter(df, df$Species != "setosa")
> Error in filter(df, df$Species != "setosa") : 
>   no method for coercing this S4 class to a vector
> > training <- SparkR::filter(df, df$Species != "setosa")
> > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, 
> > family = "binomial")
> 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> java.net.SocketTimeoutException: Accept timed out
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>         at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>         at java.net.ServerSocket.accept(ServerSocket.java:498)
>         at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431)
>         at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:77)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:45)
>         at org.apache.spark.scheduler.Task.run(Task.scala:81)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 16/02/26 16:26:46 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; 
> aborting job
> 16/02/26 16:26:46 ERROR RBackendHandler: fitRModelFormula on 
> org.apache.spark.ml.api.r.SparkRWrappers failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1, localhost): java.net.SocketTimeoutException: Accept timed out
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>         at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>         at java.net.ServerSocket.accept(ServerSocket.java:498)
>         at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431)
>         at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > 
> {code}
> Even, when I try to run the head command on the dataframe, I get similar 
> error:
> {code}
> > SparkR::head(df)
> 16/02/26 16:32:05 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 2)
> java.net.SocketTimeoutException: Accept timed out
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>         at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>         at java.net.ServerSocket.accept(ServerSocket.java:498)
>         at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431)
>         at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69)
>         at org.apache.spark.scheduler.Task.run(Task.scala:81)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 16/02/26 16:32:05 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times; 
> aborting job
> 16/02/26 16:32:05 ERROR RBackendHandler: dfToCols on 
> org.apache.spark.sql.api.r.SQLUtils failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 
> (TID 2, localhost): java.net.SocketTimeoutException: Accept timed out
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>         at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>         at java.net.ServerSocket.accept(ServerSocket.java:498)
>         at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431)
>         at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> {code}
> I have a .Rprofile file in my directory which looks like the following:
> {code:title=.Rprofile|borderStyle=solid}
> # Sample Rprofile.site file 
> # Things you might want to change
> .First <- function(){
>   cat("\nWelcome at", date(), "\n") 
>   SPARK_HOME <- "/content/user/SOFTWARE/spark"
>   .libPaths(c(file.path(SPARK_HOME, "R", "lib"), .libPaths()))
>   library(SparkR)
>   sc <<- sparkR.init(master="local[20]", appName="Model SparkR", 
> sparkHome=SPARK_HOME,
>     sparkEnvir=list(spark.local.dir="./tmp",
>     spark.executor.memory="50g",
>     spark.driver.maxResultSize="50g",
>     spark.driver.memory="50g"))
>   sqlContext <<- sparkRSQL.init(sc)
> }
> .Last <- function(){ 
>   cat("\nGoodbye at ", date(), "\n")
> }
> {code}
> I am using the master branch of Spark since the following commit:
> {code}
> commit 35316cb0b744bef9bcb390411ddc321167f953be
> Author: Yu ISHIKAWA <yuu.ishik...@gmail.com>
> Date:   Thu Feb 25 13:29:10 2016 -0800
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to