[ https://issues.apache.org/jira/browse/SPARK-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682472#comment-14682472 ]
Alex Liu commented on SPARK-4300: --------------------------------- I created SPARK-9844 for the unsolved issue. > Race condition during SparkWorker shutdown > ------------------------------------------ > > Key: SPARK-4300 > URL: https://issues.apache.org/jira/browse/SPARK-4300 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Affects Versions: 1.1.0 > Reporter: Alex Liu > Assignee: Sean Owen > Priority: Minor > Fix For: 1.2.2, 1.3.1, 1.4.0 > > > When a shark job is done. there are some error message as following show in > the log > {code} > INFO 22:10:41,635 SparkMaster: > akka.tcp://sparkdri...@ip-172-31-11-204.us-west-1.compute.internal:57641 got > disassociated, removing it. > INFO 22:10:41,640 SparkMaster: Removing app app-20141106221014-0000 > INFO 22:10:41,687 SparkMaster: Removing application > Shark::ip-172-31-11-204.us-west-1.compute.internal > INFO 22:10:41,710 SparkWorker: Asked to kill executor > app-20141106221014-0000/0 > INFO 22:10:41,712 SparkWorker: Runner thread for executor > app-20141106221014-0000/0 interrupted > INFO 22:10:41,714 SparkWorker: Killing process! > ERROR 22:10:41,738 SparkWorker: Error writing stream to file > /var/lib/spark/work/app-20141106221014-0000/0/stdout > ERROR 22:10:41,739 SparkWorker: java.io.IOException: Stream closed > ERROR 22:10:41,739 SparkWorker: at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) > ERROR 22:10:41,740 SparkWorker: at > java.io.BufferedInputStream.read1(BufferedInputStream.java:272) > ERROR 22:10:41,740 SparkWorker: at > java.io.BufferedInputStream.read(BufferedInputStream.java:334) > ERROR 22:10:41,740 SparkWorker: at > java.io.FilterInputStream.read(FilterInputStream.java:107) > ERROR 22:10:41,741 SparkWorker: at > org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) > ERROR 22:10:41,741 SparkWorker: at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) > ERROR 22:10:41,741 SparkWorker: at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) > ERROR 22:10:41,742 SparkWorker: at > org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) > ERROR 22:10:41,742 SparkWorker: at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) > ERROR 22:10:41,742 SparkWorker: at > org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) > INFO 22:10:41,838 SparkMaster: Connected to Cassandra cluster: 4299 > INFO 22:10:41,839 SparkMaster: Adding host 172.31.11.204 (Analytics) > INFO 22:10:41,840 SparkMaster: New Cassandra host /172.31.11.204:9042 added > INFO 22:10:41,841 SparkMaster: Adding host 172.31.11.204 (Analytics) > INFO 22:10:41,842 SparkMaster: Adding host 172.31.11.204 (Analytics) > INFO 22:10:41,852 SparkMaster: > akka.tcp://sparkdri...@ip-172-31-11-204.us-west-1.compute.internal:57641 got > disassociated, removing it. > INFO 22:10:41,853 SparkMaster: > akka.tcp://sparkdri...@ip-172-31-11-204.us-west-1.compute.internal:57641 got > disassociated, removing it. > INFO 22:10:41,853 SparkMaster: > akka.tcp://sparkdri...@ip-172-31-11-204.us-west-1.compute.internal:57641 got > disassociated, removing it. > INFO 22:10:41,857 SparkMaster: > akka.tcp://sparkdri...@ip-172-31-11-204.us-west-1.compute.internal:57641 got > disassociated, removing it. > INFO 22:10:41,862 SparkMaster: Adding host 172.31.11.204 (Analytics) > WARN 22:10:42,200 SparkMaster: Got status update for unknown executor > app-20141106221014-0000/0 > INFO 22:10:42,211 SparkWorker: Executor app-20141106221014-0000/0 finished > with state KILLED exitStatus 143 > {code} > /var/lib/spark/work/app-20141106221014-0000/0/stdout is on the disk. It is > trying to write to a close IO stream. > Spark worker shuts down by {code} > private def killProcess(message: Option[String]) { > var exitCode: Option[Int] = None > logInfo("Killing process!") > process.destroy() > process.waitFor() > if (stdoutAppender != null) { > stdoutAppender.stop() > } > if (stderrAppender != null) { > stderrAppender.stop() > } > if (process != null) { > exitCode = Some(process.waitFor()) > } > worker ! ExecutorStateChanged(appId, execId, state, message, exitCode) > > {code} > But stdoutAppender concurrently writes to output log file, which creates race > condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org