[
https://issues.apache.org/jira/browse/SPARK-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005504#comment-15005504
]
Jason Huang edited comment on SPARK-9844 at 11/14/15 5:38 PM:
--------------------------------------------------------------
Got the same error log in workers and my workers keep being disassociated.
{code:java}
15/11/15 01:25:26 INFO worker.Worker: Asked to kill executor
app-20151115012248-0081/2
15/11/15 01:25:26 INFO worker.ExecutorRunner: Runner thread for executor
app-20151115012248-0081/2 interrupted
15/11/15 01:25:26 INFO worker.ExecutorRunner: Killing process!
15/11/15 01:25:26 ERROR logging.FileAppender: Error writing stream to file
/usr/local/spark-1.5.1-bin-hadoop2.6/work/app-20151115012248-0081/2/stderr
java.io.IOException: Stream closed
at
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
15/11/15 01:25:26 INFO worker.Worker: Executor app-20151115012248-0081/2
finished with state KILLED exitStatus 143
15/11/15 01:25:26 INFO worker.Worker: Cleaning up local directories for
application app-20151115012248-0081
15/11/15 01:25:26 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://[email protected]:46780] has failed, address is
now gated for [5000] ms. Reason: [Disassociated]
15/11/15 01:25:26 INFO shuffle.ExternalShuffleBlockResolver: Application
app-20151115012248-0081 removed, cleanupLocalDirs = true
{code}
We use python3 to run our Spark jobs
{code:java}
#!/usr/bin/python3
import os
import sys
SPARK_HOME = "/usr/local/spark"
os.environ["SPARK_HOME"] = SPARK_HOME
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-7-oracle"
os.environ["PYSPARK_PYTHON"] = "/usr/bin/python3"
sys.path.append(os.path.join(SPARK_HOME, 'python'))
sys.path.append(os.path.join(SPARK_HOME, 'python/lib/py4j-0.8.2.1-src.zip'))
from pyspark import SparkContext, SparkConf
conf = (SparkConf().setMaster("spark://10.1.2.1:7077")
.setAppName("Generate")
.setAll((
("spark.cores.max", "1"),
("spark.driver.memory", "1g"),
("spark.executor.memory", "1g"),
("spark.python.worker.memory", "1g"))))
{code}
was (Author: jasson15):
Got the same error log in workers and my workers keep being disassociated.
15/11/15 01:25:26 INFO worker.Worker: Asked to kill executor
app-20151115012248-0081/2
15/11/15 01:25:26 INFO worker.ExecutorRunner: Runner thread for executor
app-20151115012248-0081/2 interrupted
15/11/15 01:25:26 INFO worker.ExecutorRunner: Killing process!
15/11/15 01:25:26 ERROR logging.FileAppender: Error writing stream to file
/usr/local/spark-1.5.1-bin-hadoop2.6/work/app-20151115012248-0081/2/stderr
java.io.IOException: Stream closed
at
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
15/11/15 01:25:26 INFO worker.Worker: Executor app-20151115012248-0081/2
finished with state KILLED exitStatus 143
15/11/15 01:25:26 INFO worker.Worker: Cleaning up local directories for
application app-20151115012248-0081
15/11/15 01:25:26 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://[email protected]:46780] has failed, address is
now gated for [5000] ms. Reason: [Disassociated]
15/11/15 01:25:26 INFO shuffle.ExternalShuffleBlockResolver: Application
app-20151115012248-0081 removed, cleanupLocalDirs = true
We use python3 to run our Spark jobs
#!/usr/bin/python3
import os
import sys
SPARK_HOME = "/usr/local/spark"
os.environ["SPARK_HOME"] = SPARK_HOME
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-7-oracle"
os.environ["PYSPARK_PYTHON"] = "/usr/bin/python3"
sys.path.append(os.path.join(SPARK_HOME, 'python'))
sys.path.append(os.path.join(SPARK_HOME, 'python/lib/py4j-0.8.2.1-src.zip'))
from pyspark import SparkContext, SparkConf
conf = (SparkConf().setMaster("spark://10.1.2.1:7077")
.setAppName("Generate")
.setAll((
("spark.cores.max", "1"),
("spark.driver.memory", "1g"),
("spark.executor.memory", "1g"),
("spark.python.worker.memory", "1g"))))
> File appender race condition during SparkWorker shutdown
> --------------------------------------------------------
>
> Key: SPARK-9844
> URL: https://issues.apache.org/jira/browse/SPARK-9844
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.3.0, 1.4.0
> Reporter: Alex Liu
>
> We find this issue still exists in 1.3.1
> {code}
> ERROR [Thread-6] 2015-07-28 22:49:57,653 SparkWorker-0 ExternalLogger.java:96
> - Error writing stream to file
> /var/lib/spark/worker/worker-0/app-20150728224954-0003/0/stderr
> ERROR [Thread-6] 2015-07-28 22:49:57,653 SparkWorker-0 ExternalLogger.java:96
> - java.io.IOException: Stream closed
> ERROR [Thread-6] 2015-07-28 22:49:57,654 SparkWorker-0 ExternalLogger.java:96
> - at
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
> ~[na:1.8.0_40]
> ERROR [Thread-6] 2015-07-28 22:49:57,654 SparkWorker-0 ExternalLogger.java:96
> - at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)
> ~[na:1.8.0_40]
> ERROR [Thread-6] 2015-07-28 22:49:57,654 SparkWorker-0 ExternalLogger.java:96
> - at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> ~[na:1.8.0_40]
> ERROR [Thread-6] 2015-07-28 22:49:57,654 SparkWorker-0 ExternalLogger.java:96
> - at java.io.FilterInputStream.read(FilterInputStream.java:107)
> ~[na:1.8.0_40]
> ERROR [Thread-6] 2015-07-28 22:49:57,655 SparkWorker-0 ExternalLogger.java:96
> - at
> org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
> ~[spark-core_2.10-1.3.1.1.jar:1.3.1.1]
> ERROR [Thread-6] 2015-07-28 22:49:57,655 SparkWorker-0 ExternalLogger.java:96
> - at
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
> [spark-core_2.10-1.3.1.1.jar:1.3.1.1]
> ERROR [Thread-6] 2015-07-28 22:49:57,655 SparkWorker-0 ExternalLogger.java:96
> - at
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
> [spark-core_2.10-1.3.1.1.jar:1.3.1.1]
> ERROR [Thread-6] 2015-07-28 22:49:57,655 SparkWorker-0 ExternalLogger.java:96
> - at
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
> [spark-core_2.10-1.3.1.1.jar:1.3.1.1]
> ERROR [Thread-6] 2015-07-28 22:49:57,655 SparkWorker-0 ExternalLogger.java:96
> - at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
> [spark-core_2.10-1.3.1.1.jar:1.3.1.1]
> ERROR [Thread-6] 2015-07-28 22:49:57,656 SparkWorker-0 ExternalLogger.java:96
> - at
> org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
> [spark-core_2.10-1.3.1.1.jar:1.3.1.1]
> {code}
> at
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala#L159
> The process auto shuts down, but the log appenders are still running, which
> causes the error log messages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]