Satish Kolli created SPARK-13514:
------------------------------------

             Summary: Spark Shuffle Service 1.6.0 issue in Yarn 
                 Key: SPARK-13514
                 URL: https://issues.apache.org/jira/browse/SPARK-13514
             Project: Spark
          Issue Type: Bug
            Reporter: Satish Kolli


Spark shuffle service 1.6.0 in Yarn fails with an unknown exception. When I 
replace the spark shuffle jar with version 1.5.2 jar file, the following 
succeeds with out any issues.

Hadoop Version: 2.5.1 (Kerberos Enabled)
Spark Version: 1.6.0

{code}
$SPARK_HOME/bin/spark-shell \
--master yarn \
--deploy-mode client \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.minExecutors=5 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.shuffle.service.enabled=true \
--conf spark.scheduler.mode=FAIR \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--executor-memory 6G \
--driver-memory 8G
{code}

{code}
scala> val df = sc.parallelize(1 to 50).toDF
df: org.apache.spark.sql.DataFrame = [_1: int]

scala> df.show(50)
{code}

{code}
16/02/26 08:20:53 INFO spark.SparkContext: Starting job: show at <console>:30
16/02/26 08:20:53 INFO scheduler.DAGScheduler: Got job 0 (show at <console>:30) 
with 1 output partitions
16/02/26 08:20:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (show 
at <console>:30)
16/02/26 08:20:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/02/26 08:20:53 INFO scheduler.DAGScheduler: Missing parents: List()
16/02/26 08:20:53 INFO scheduler.DAGScheduler: Submitting ResultStage 0 
(MapPartitionsRDD[2] at show at <console>:30), which has no missing parents
16/02/26 08:20:53 INFO storage.MemoryStore: Block broadcast_0 stored as values 
in memory (estimated size 2.2 KB, free 2.2 KB)
16/02/26 08:20:53 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 1411.0 B, free 3.6 KB)
16/02/26 08:20:53 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on 10.5.76.106:46683 (size: 1411.0 B, free: 5.5 GB)
16/02/26 08:20:53 INFO spark.SparkContext: Created broadcast 0 from broadcast 
at DAGScheduler.scala:1006
16/02/26 08:20:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from 
ResultStage 0 (MapPartitionsRDD[2] at show at <console>:30)
16/02/26 08:20:53 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
16/02/26 08:20:53 INFO scheduler.FairSchedulableBuilder: Added task set 
TaskSet_0 tasks to pool default
16/02/26 08:20:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 
(TID 0, XXXXXXXXXXXXXXXXXXXXXXXX, partition 0,PROCESS_LOCAL, 2031 bytes)
16/02/26 08:20:53 INFO cluster.YarnClientSchedulerBackend: Disabling executor 2.
16/02/26 08:20:54 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 0)
16/02/26 08:20:54 INFO storage.BlockManagerMasterEndpoint: Trying to remove 
executor 2 from BlockManagerMaster.
16/02/26 08:20:54 INFO storage.BlockManagerMasterEndpoint: Removing block 
manager BlockManagerId(2, XXXXXXXXXXXXXXXXXXXXXXXX, 48113)
16/02/26 08:20:54 INFO storage.BlockManagerMaster: Removed 2 successfully in 
removeExecutor
16/02/26 08:20:54 ERROR cluster.YarnScheduler: Lost executor 2 on 
XXXXXXXXXXXXXXXXXXXXXXXX: Container marked as failed: 
container_1456492687549_0001_01_000003 on host: XXXXXXXXXXXXXXXXXXXXXXXX. Exit 
status: 1. Diagnostics: Exception from container-launch: ExitCodeException 
exitCode=1:
ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to