[
https://issues.apache.org/jira/browse/SPARK-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-16190:
------------------------------------
Assignee: Apache Spark
> Worker registration failed: Duplicate worker ID
> -----------------------------------------------
>
> Key: SPARK-16190
> URL: https://issues.apache.org/jira/browse/SPARK-16190
> Project: Spark
> Issue Type: Bug
> Components: Scheduler, Spark Core
> Affects Versions: 1.6.1
> Reporter: Thomas Huang
> Assignee: Apache Spark
> Priority: Minor
> Attachments:
> spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave19.out,
> spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave2.out,
> spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave7.out,
> spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave8.out
>
>
> Several worker crashed simultaneously due to this error:
> Worker registration failed: Duplicate worker ID
> This is the worker log on one of those crashed workers:
> 16/06/24 16:28:53 INFO ExecutorRunner: Killing process!
> 16/06/24 16:28:53 INFO ExecutorRunner: Runner thread for executor
> app-20160624003013-0442/26 interrupted
> 16/06/24 16:28:53 INFO ExecutorRunner: Killing process!
> 16/06/24 16:29:03 WARN ExecutorRunner: Failed to terminate process:
> java.lang.UNIXProcess@31340137. This process will likely be orphaned.
> 16/06/24 16:29:03 WARN ExecutorRunner: Failed to terminate process:
> java.lang.UNIXProcess@4d3bdb1d. This process will likely be orphaned.
> 16/06/24 16:29:03 INFO Worker: Executor app-20160624003013-0442/8 finished
> with state KILLED
> 16/06/24 16:29:03 INFO Worker: Executor app-20160624003013-0442/26 finished
> with state KILLED
> 16/06/24 16:29:03 INFO Worker: Cleaning up local directories for application
> app-20160624003013-0442
> 16/06/24 16:31:18 INFO ExternalShuffleBlockResolver: Application
> app-20160624003013-0442 removed, cleanupLocalDirs = true
> 16/06/24 16:31:18 INFO Worker: Asked to launch executor
> app-20160624162905-0469/14 for SparkStreamingLRScala
> 16/06/24 16:31:18 INFO SecurityManager: Changing view acls to: mqq
> 16/06/24 16:31:18 INFO SecurityManager: Changing modify acls to: mqq
> 16/06/24 16:31:18 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(mqq); users with
> modify permissions: Set(mqq)
> 16/06/24 16:31:18 INFO ExecutorRunner: Launch command:
> "/data/jdk1.7.0_60/bin/java" "-cp"
> "/data/spark-1.6.1-bin-cdh4/conf/:/data/spark-1.6.1-bin-cdh4/lib/spark-assembly-1.6.1-hadoop2.3.0.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar"
> "-Xms10240M" "-Xmx10240M" "-Dspark.driver.port=34792" "-XX:MaxPermSize=256m"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
> "spark://[email protected]:34792" "--executor-id" "14"
> "--hostname" "100.65.21.223" "--cores" "5" "--app-id"
> "app-20160624162905-0469" "--worker-url" "spark://[email protected]:46581"
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Connecting to master 100.65.21.199:7077...
> 16/06/24 16:31:18 INFO Worker: Successfully registered with master
> spark://100.65.21.199:7077
> 16/06/24 16:31:18 INFO Worker: Worker cleanup enabled; old application
> directories will be deleted in: /data/spark-1.6.1-bin-cdh4/work
> 16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with
> the master, since there is an attempt scheduled already.
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Connecting to master 100.65.21.199:7077...
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with
> the master, since there is an attempt scheduled already.
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with
> the master, since there is an attempt scheduled already.
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with
> the master, since there is an attempt scheduled already.
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with
> the master, since there is an attempt scheduled already.
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with
> the master, since there is an attempt scheduled already.
> 16/06/24 16:31:18 INFO Worker: Master with url spark://100.65.21.199:7077
> requested this worker to reconnect.
> 16/06/24 16:31:18 INFO Worker: Not spawning another attempt to register with
> the master, since there is an attempt scheduled already.
> 16/06/24 16:31:18 INFO Worker: Asked to launch executor
> app-20160624153031-0467/27 for SparkRealtimeRecommender
> 16/06/24 16:31:18 INFO SecurityManager: Changing view acls to: mqq
> 16/06/24 16:31:18 INFO SecurityManager: Changing modify acls to: mqq
> 16/06/24 16:31:18 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(mqq); users with
> modify permissions: Set(mqq)
> 16/06/24 16:31:18 INFO ExecutorRunner: Launch command:
> "/data/jdk1.7.0_60/bin/java" "-cp"
> "/data/spark-1.6.1-bin-cdh4/conf/:/data/spark-1.6.1-bin-cdh4/lib/spark-assembly-1.6.1-hadoop2.3.0.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/data/spark-1.6.1-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar"
> "-Xms61440M" "-Xmx61440M" "-Dspark.driver.port=50193" "-XX:MaxPermSize=256m"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
> "spark://[email protected]:50193" "--executor-id" "27"
> "--hostname" "100.65.21.223" "--cores" "46" "--app-id"
> "app-20160624153031-0467" "--worker-url" "spark://[email protected]:46581"
> 16/06/24 16:31:18 ERROR Worker: Worker registration failed: Duplicate worker
> ID
> 16/06/24 16:31:18 INFO ExecutorRunner: Killing process!
> 16/06/24 16:31:18 INFO ExecutorRunner: Killing process!
> 16/06/24 16:31:18 INFO ExecutorRunner: Killing process!
> 16/06/24 16:31:18 INFO ExecutorRunner: Killing process!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]