[
https://issues.apache.org/jira/browse/SPARK-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shivaram Venkataraman updated SPARK-16299:
------------------------------------------
Assignee: Sun Rui (was: Apache Spark)
> Capture errors from R workers in daemon.R to avoid deletion of R session
> temporary directory
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-16299
> URL: https://issues.apache.org/jira/browse/SPARK-16299
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 1.6.2
> Reporter: Sun Rui
> Assignee: Sun Rui
> Fix For: 2.0.0
>
>
> Running SparkR unit tests randomly has the following error:
> Failed
> -------------------------------------------------------------------------
> 1. Error: pipeRDD() on RDDs (@test_rdd.R#428)
> ----------------------------------
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 792.0 failed 1 times, most recent failure: Lost task 0.0 in stage 792.0
> (TID 1493, localhost): org.apache.spark.SparkException: R computation failed
> with
> [1] 1
> [1] 1
> [1] 2
> [1] 2
> [1] 3
> [1] 3
> [1] 2
> [1] 2
> [1] 2
> [1] 2
> [1] 2
> [1] 2
> ignoring SIGPIPE signal
> Calls: source ... <Anonymous> -> lapply -> lapply -> FUN -> writeRaw ->
> writeBin
> Execution halted
> cannot open the connection
> Calls: source ... computeFunc -> FUN -> system2 -> writeLines -> file
> In addition: Warning message:
> In file(con, "w") :
> cannot open file '/tmp/Rtmp0Gr1aU/file2de3efc94b3': No such file or
> directory
> Execution halted
> at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
> at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:49)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> This is related to daemon R worker mode. By default, SparkR launches an R
> daemon worker per executor, and forks R workers from the daemon when
> necessary.
> The problem about forking R worker is that all forked R processes share a
> temporary directory, as documented at
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/tempfile.html.
> When any forked R worker exits either normally or caused by errors, the
> cleanup procedure of R will delete the temporary directory. This will affect
> the still-running forked R workers because any temporary files created by
> them under the temporary directories will be removed together. Also all
> future R workers that will be forked from the daemon will be affected if they
> use tempdir() or tempfile() to get tempoaray files because they will fail to
> create temporary files under the already-deleted session temporary directory.
> So in order for the daemon mode to work, this problem should be circumvented.
> In current dameon.R, R workers directly exits skipping the cleanup procedure
> of R so that the shared temporary directory won't be deleted.
> {code}
> source(script)
> # Set SIGUSR1 so that child can exit
> tools::pskill(Sys.getpid(), tools::SIGUSR1)
> parallel:::mcexit(0L)
> {code}
> However, this is a bug in daemon.R, that when there is any execution error in
> R workers, the error handling of R will finally go into the cleanup
> procedure. So try() should be used in daemon.R to catch any error in R
> workers, so that R workers will directly exit.
> {code}
> try(source(script))
> # Set SIGUSR1 so that child can exit
> tools::pskill(Sys.getpid(), tools::SIGUSR1)
> parallel:::mcexit(0L)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]