[
https://issues.apache.org/jira/browse/SPARK-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yeliang Cang updated SPARK-25869:
---------------------------------
Description:
When configure spark on yarn, I submit job using below command:
{code}
spark-submit --class org.apache.spark.examples.SparkPi --master yarn
--deploy-mode cluster --driver-memory 127m --driver-cores 1
--executor-memory 2048m --executor-cores 1 --num-executors 10 --queue
root.mr --conf spark.testing.reservedMemory=1048576 --conf
spark.yarn.executor.memoryOverhead=50 --conf
spark.yarn.driver.memoryOverhead=50
/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 10000
{code}
Apparently, the driver memory is not enough, but this can not be seen in spark
client log:
{code}
2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0013 (state: ACCEPTED)
2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0013 (state: RUNNING)
2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.43.183.143
ApplicationMaster RPC port: 0
queue: root.mr
start time: 1540812501560
final status: UNDEFINED
tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
user: mr
2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0013 (state: FINISHED)
2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: Shutdown hook called before final status was reported.
ApplicationMaster host: 10.43.183.143
ApplicationMaster RPC port: 0
queue: root.mr
start time: 1540812501560
final status: FAILED
tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
user: mr
Exception in thread "main" org.apache.spark.SparkException: Application
application_1540536615315_0013 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager:
Shutdown hook called
2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager:
Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
{code}
Solution: after apply the patch, spark client log can be shown as:
{code}
2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0012 (state: RUNNING)
2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.43.183.143
ApplicationMaster RPC port: 0
queue: root.mr
start time: 1540812436656
final status: UNDEFINED
tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
user: mr
2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0012 (state: FAILED)
2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: Application application_1540536615315_0012 failed 2 times due to
AM Container for appattempt_1540536615315_0012_000002 exited with exitCode: -104
For more detailed output, check application tracking
page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click
on links to logs of each attempt.
Diagnostics: virtual memory used. Killing container.
Dump of the process-tree for container_e53_1540536615315_0012_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS)
VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 1532 1528 1528 1528 (java) 1209 174 3472551936 65185 /usr/java/jdk/bin/java
-server -Xmx127m
-Djava.io.tmpdir=/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/tmp
-Xss32M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M
-Dspark.yarn.app.container.log.dir=/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001
org.apache.spark.deploy.yarn.ApplicationMaster --class
org.apache.spark.examples.SparkPi --jar
file:/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples_2.11-2.2.1-zdh8.5.1.jar
--arg 10000 --properties-file
/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/__spark_conf__/__spark_conf__.properties
|- 1528 1526 1528 1528 (bash) 0 0 108642304 309 /bin/bash -c
LD_LIBRARY_PATH=/opt/ZDH/parcels/lib/hadoop/lib/native: /usr/java/jdk/bin/java
-server -Xmx127m
-Djava.io.tmpdir=/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/tmp
'-Xss32M' '-XX:MetaspaceSize=128M' '-XX:MaxMetaspaceSize=512M'
-Dspark.yarn.app.container.log.dir=/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001
org.apache.spark.deploy.yarn.ApplicationMaster --class
'org.apache.spark.examples.SparkPi' --jar
file:/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples_2.11-2.2.1-zdh8.5.1.jar
--arg '10000' --properties-file
/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/__spark_conf__/__spark_conf__.properties
1>
/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/stdout
2>
/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
PmemUsageMBsMaxMBs is: 255.0 MBFailing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.mr
start time: 1540812436656
final status: FAILED
tracking URL: http://zdh141:8088/cluster/app/application_1540536615315_0012
user: mr
2018-10-29 19:27:34,542 INFO org.apache.spark.deploy.yarn.Client: Deleted
staging directory
hdfs://nameservice/user/mr/.sparkStaging/application_1540536615315_0012
Exception in thread "main" org.apache.spark.SparkException: Application
application_1540536615315_0012 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-10-29 19:27:34,548 INFO org.apache.spark.util.ShutdownHookManager:
Shutdown hook called
2018-10-29 19:27:34,549 INFO org.apache.spark.util.ShutdownHookManager:
Deleting directory /tmp/spark-ce35f2ad-ec1f-4173-9441-163e2482ed61
{code}
Now we can see the true reason for job failure from client!
was:
When configure spark on yarn, I submit job using below command:
{code}
spark-submit --class org.apache.spark.examples.SparkPi --master yarn
--deploy-mode cluster --driver-memory 127m --driver-cores 1
--executor-memory 2048m --executor-cores 1 --num-executors 10 --queue
root.mr --conf spark.testing.reservedMemory=1048576 --conf
spark.yarn.executor.memoryOverhead=50 --conf
spark.yarn.driver.memoryOverhead=50
/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 10000
{code}
Apparently, the driver memory is not enough, but this can not be seen in spark
client log:
{code}
2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0013 (state: ACCEPTED)
2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0013 (state: RUNNING)
2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.43.183.143
ApplicationMaster RPC port: 0
queue: root.mr
start time: 1540812501560
final status: UNDEFINED
tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
user: mr
2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0013 (state: FINISHED)
2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: Shutdown hook called before final status was reported.
ApplicationMaster host: 10.43.183.143
ApplicationMaster RPC port: 0
queue: root.mr
start time: 1540812501560
final status: FAILED
tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
user: mr
Exception in thread "main" org.apache.spark.SparkException: Application
application_1540536615315_0013 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager:
Shutdown hook called
2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager:
Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
{code}
Solution: after apply the patch, spark client log can be shown as:
{code}
2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0012 (state: RUNNING)
2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.43.183.143
ApplicationMaster RPC port: 0
queue: root.mr
start time: 1540812436656
final status: UNDEFINED
tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
user: mr
2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application
report for application_1540536615315_0012 (state: FAILED)
2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
client token: N/A
diagnostics: Application application_1540536615315_0012 failed 2 times due to
AM Container for appattempt_1540536615315_0012_000002 exited with exitCode: -104
For more detailed output, check application tracking
page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click
on links to logs of each attempt.
Diagnostics: virtual memory used. Killing container.
Dump of the process-tree for container_e53_1540536615315_0012_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS)
VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 1532 1528 1528 1528 (java) 1209 174 3472551936 65185 /usr/java/jdk/bin/java
-server -Xmx127m
-Djava.io.tmpdir=/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/tmp
-Xss32M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M
-Dspark.yarn.app.container.log.dir=/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001
org.apache.spark.deploy.yarn.ApplicationMaster --class
org.apache.spark.examples.SparkPi --jar
file:/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples_2.11-2.2.1-zdh8.5.1.jar
--arg 10000 --properties-file
/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/__spark_conf__/__spark_conf__.properties
|- 1528 1526 1528 1528 (bash) 0 0 108642304 309 /bin/bash -c
LD_LIBRARY_PATH=/opt/ZDH/parcels/lib/hadoop/lib/native: /usr/java/jdk/bin/java
-server -Xmx127m
-Djava.io.tmpdir=/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/tmp
'-Xss32M' '-XX:MetaspaceSize=128M' '-XX:MaxMetaspaceSize=512M'
-Dspark.yarn.app.container.log.dir=/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001
org.apache.spark.deploy.yarn.ApplicationMaster --class
'org.apache.spark.examples.SparkPi' --jar
file:/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples_2.11-2.2.1-zdh8.5.1.jar
--arg '10000' --properties-file
/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/__spark_conf__/__spark_conf__.properties
1>
/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/stdout
2>
/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
PmemUsageMBsMaxMBs is: 255.0 MBFailing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.mr
start time: 1540812436656
final status: FAILED
tracking URL: http://zdh141:8088/cluster/app/application_1540536615315_0012
user: mr
2018-10-29 19:27:34,542 INFO org.apache.spark.deploy.yarn.Client: Deleted
staging directory
hdfs://nameservice/user/mr/.sparkStaging/application_1540536615315_0012
Exception in thread "main" org.apache.spark.SparkException: Application
application_1540536615315_0012 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-10-29 19:27:34,548 INFO org.apache.spark.util.ShutdownHookManager:
Shutdown hook called
2018-10-29 19:27:34,549 INFO org.apache.spark.util.ShutdownHookManager:
Deleting directory /tmp/spark-ce35f2ad-ec1f-4173-9441-163e2482ed61
{code}
> Spark on YARN: the original diagnostics is missing when job failed
> maxAppAttempts times
> ---------------------------------------------------------------------------------------
>
> Key: SPARK-25869
> URL: https://issues.apache.org/jira/browse/SPARK-25869
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 2.1.1
> Reporter: Yeliang Cang
> Priority: Major
>
> When configure spark on yarn, I submit job using below command:
> {code}
> spark-submit --class org.apache.spark.examples.SparkPi --master yarn
> --deploy-mode cluster --driver-memory 127m --driver-cores 1
> --executor-memory 2048m --executor-cores 1 --num-executors 10 --queue
> root.mr --conf spark.testing.reservedMemory=1048576 --conf
> spark.yarn.executor.memoryOverhead=50 --conf
> spark.yarn.driver.memoryOverhead=50
> /opt/ZDH/parcels/lib/spark/examples/jars/spark-examples* 10000
> {code}
> Apparently, the driver memory is not enough, but this can not be seen in
> spark client log:
> {code}
> 2018-10-29 19:28:34,658 INFO org.apache.spark.deploy.yarn.Client: Application
> report for application_1540536615315_0013 (state: ACCEPTED)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client: Application
> report for application_1540536615315_0013 (state: RUNNING)
> 2018-10-29 19:28:35,660 INFO org.apache.spark.deploy.yarn.Client:
> client token: N/A
> diagnostics: N/A
> ApplicationMaster host: 10.43.183.143
> ApplicationMaster RPC port: 0
> queue: root.mr
> start time: 1540812501560
> final status: UNDEFINED
> tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
> user: mr
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client: Application
> report for application_1540536615315_0013 (state: FINISHED)
> 2018-10-29 19:28:36,663 INFO org.apache.spark.deploy.yarn.Client:
> client token: N/A
> diagnostics: Shutdown hook called before final status was reported.
> ApplicationMaster host: 10.43.183.143
> ApplicationMaster RPC port: 0
> queue: root.mr
> start time: 1540812501560
> final status: FAILED
> tracking URL: http://zdh141:8088/proxy/application_1540536615315_0013/
> user: mr
> Exception in thread "main" org.apache.spark.SparkException: Application
> application_1540536615315_0013 finished with failed status
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:28:36,694 INFO org.apache.spark.util.ShutdownHookManager:
> Shutdown hook called
> 2018-10-29 19:28:36,695 INFO org.apache.spark.util.ShutdownHookManager:
> Deleting directory /tmp/spark-96077be5-0dfa-496d-a6a0-96e83393a8d9
> {code}
>
>
> Solution: after apply the patch, spark client log can be shown as:
> {code}
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client: Application
> report for application_1540536615315_0012 (state: RUNNING)
> 2018-10-29 19:27:32,962 INFO org.apache.spark.deploy.yarn.Client:
> client token: N/A
> diagnostics: N/A
> ApplicationMaster host: 10.43.183.143
> ApplicationMaster RPC port: 0
> queue: root.mr
> start time: 1540812436656
> final status: UNDEFINED
> tracking URL: http://zdh141:8088/proxy/application_1540536615315_0012/
> user: mr
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client: Application
> report for application_1540536615315_0012 (state: FAILED)
> 2018-10-29 19:27:33,964 INFO org.apache.spark.deploy.yarn.Client:
> client token: N/A
> diagnostics: Application application_1540536615315_0012 failed 2 times due
> to AM Container for appattempt_1540536615315_0012_000002 exited with
> exitCode: -104
> For more detailed output, check application tracking
> page:http://zdh141:8088/cluster/app/application_1540536615315_0012Then, click
> on links to logs of each attempt.
> Diagnostics: virtual memory used. Killing container.
> Dump of the process-tree for container_e53_1540536615315_0012_02_000001 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 1532 1528 1528 1528 (java) 1209 174 3472551936 65185
> /usr/java/jdk/bin/java -server -Xmx127m
> -Djava.io.tmpdir=/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/tmp
> -Xss32M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M
> -Dspark.yarn.app.container.log.dir=/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001
> org.apache.spark.deploy.yarn.ApplicationMaster --class
> org.apache.spark.examples.SparkPi --jar
> file:/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples_2.11-2.2.1-zdh8.5.1.jar
> --arg 10000 --properties-file
> /data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/__spark_conf__/__spark_conf__.properties
> |- 1528 1526 1528 1528 (bash) 0 0 108642304 309 /bin/bash -c
> LD_LIBRARY_PATH=/opt/ZDH/parcels/lib/hadoop/lib/native:
> /usr/java/jdk/bin/java -server -Xmx127m
> -Djava.io.tmpdir=/data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/tmp
> '-Xss32M' '-XX:MetaspaceSize=128M' '-XX:MaxMetaspaceSize=512M'
> -Dspark.yarn.app.container.log.dir=/data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001
> org.apache.spark.deploy.yarn.ApplicationMaster --class
> 'org.apache.spark.examples.SparkPi' --jar
> file:/opt/ZDH/parcels/lib/spark/examples/jars/spark-examples_2.11-2.2.1-zdh8.5.1.jar
> --arg '10000' --properties-file
> /data3/zdh/yarn/local/usercache/mr/appcache/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/__spark_conf__/__spark_conf__.properties
> 1>
> /data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/stdout
> 2>
> /data1/zdh/yarn/logs/userlogs/application_1540536615315_0012/container_e53_1540536615315_0012_02_000001/stderr
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> PmemUsageMBsMaxMBs is: 255.0 MBFailing this attempt. Failing the application.
> ApplicationMaster host: N/A
> ApplicationMaster RPC port: -1
> queue: root.mr
> start time: 1540812436656
> final status: FAILED
> tracking URL: http://zdh141:8088/cluster/app/application_1540536615315_0012
> user: mr
> 2018-10-29 19:27:34,542 INFO org.apache.spark.deploy.yarn.Client: Deleted
> staging directory
> hdfs://nameservice/user/mr/.sparkStaging/application_1540536615315_0012
> Exception in thread "main" org.apache.spark.SparkException: Application
> application_1540536615315_0012 finished with failed status
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1137)
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1183)
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 2018-10-29 19:27:34,548 INFO org.apache.spark.util.ShutdownHookManager:
> Shutdown hook called
> 2018-10-29 19:27:34,549 INFO org.apache.spark.util.ShutdownHookManager:
> Deleting directory /tmp/spark-ce35f2ad-ec1f-4173-9441-163e2482ed61
> {code}
> Now we can see the true reason for job failure from client!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]