[jira] [Commented] (SPARK-9416) Yarn logs say that Spark Python job has succeeded even though job has failed in Yarn cluster mode

Apache Spark (JIRA) Wed, 29 Jul 2015 09:37:50 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646365#comment-14646365
 ]


Apache Spark commented on SPARK-9416:
-------------------------------------

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7751

> Yarn logs say that Spark Python job has succeeded even though job has failed 
> in Yarn cluster mode
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-9416
>                 URL: https://issues.apache.org/jira/browse/SPARK-9416
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.4.1
>         Environment: 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 
> 2015 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Elkhan Dadashov
>
> While running Spark Word count python example with intentional mistake in 
> Yarn cluster mode, Spark terminal logs (Yarn logs) states final status as 
> SUCCEEDED, but log files for Spark application state correct results 
> indicating that the job failed.
> Terminal log output & application log output contradict each other.
> If i run same job on local mode then terminal logs and application logs 
> match, where both state that job has failed to expected error in python 
> script.
> More details: Scenario
> While running Spark Word count python example on Yarn cluster mode, if I make 
> intentional error in wordcount.py by changing this line (I'm using Spark 
> 1.4.1, but this problem exists in Spark 1.4.0 and in 1.3.0 versions - which i 
> tested):
> lines = sc.textFile(sys.argv[1], 1)
> into this line:
> lines = sc.textFile(nonExistentVariable,1)
> where nonExistentVariable variable was never created and initialized.
> then i run that example with this command (I put README.md into HDFS before 
> running this command): 
> ./bin/spark-submit --master yarn-cluster wordcount.py /README.md
> The job runs and finishes successfully according the log printed in the 
> terminal :
> Terminal logs:
> ...
> 15/07/23 16:19:17 INFO yarn.Client: Application report for 
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:18 INFO yarn.Client: Application report for 
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:19 INFO yarn.Client: Application report for 
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:20 INFO yarn.Client: Application report for 
> application_1437612288327_0013 (state: RUNNING)
> 15/07/23 16:19:21 INFO yarn.Client: Application report for 
> application_1437612288327_0013 (state: FINISHED)
> 15/07/23 16:19:21 INFO yarn.Client: 
>        client token: N/A
>        diagnostics: Shutdown hook called before final status was reported.
>        ApplicationMaster host: 10.0.53.59
>        ApplicationMaster RPC port: 0
>        queue: default
>        start time: 1437693551439
>        final status: SUCCEEDED
>        tracking URL: 
> http://localhost:8088/proxy/application_1437612288327_0013/history/application_1437612288327_0013/1
>        user: edadashov
> 15/07/23 16:19:21 INFO util.Utils: Shutdown hook called
> 15/07/23 16:19:21 INFO util.Utils: Deleting directory 
> /tmp/spark-eba0a1b5-a216-4afa-9c54-a3cb67b16444
> But if look at log files generated for this application in HDFS - it 
> indicates failure of the job with correct reason:
> Application log files:
> ...
> \00 stdout\00 179Traceback (most recent call last):
>   File "wordcount.py", line 32, in <module>
>     lines = sc.textFile(nonExistentVariable,1)
> NameError: name 'nonExistentVariable' is not defined
> (Yarn logs to) Terminal output - final status: SUCCEEDED , is not matching 
> application log results - failure of the job (NameError: name 
> 'nonExistentVariable' is not defined) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9416) Yarn logs say that Spark Python job has succeeded even though job has failed in Yarn cluster mode

Reply via email to