[jira] [Commented] (SPARK-20579) large spark job hang on with many active stages/jobs

yao zhang (JIRA) Wed, 03 May 2017 06:51:17 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994916#comment-15994916
 ]


yao zhang commented on SPARK-20579:
-----------------------------------

I recently use scala spark to do complex data processing on large data set, but 
always run into hanged job with many active jobs/stages in web UI.
Here is details:  
   1. it only happens in spark-submit, when I test code in spark-shell, I never 
has this issue.
   2. I always use dynamic allocation. 
   3. My cluster is large enough for my task (I can use 2000 executors with mem 
8G, driver with mem 100G).
   4. The basic issue pattern is (from UI): some stages incomplete, then always 
keep active => corresponding jobs incomplete and always active => active tasks 
in executors accumulate => RDD blocks in executors accumulate => executors get 
locked => application hanged and cannot move 



> large spark job hang on with many active stages/jobs
> ----------------------------------------------------
>
>                 Key: SPARK-20579
>                 URL: https://issues.apache.org/jira/browse/SPARK-20579
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.1.0
>         Environment: spark 2.10 in hadoop (2.6.0) cluster
>            Reporter: yao zhang
>         Attachments: executor-screen.png, job-screen.png, stage-screen.png, 
> storage-screen.png, thread-dump-screen.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20579) large spark job hang on with many active stages/jobs

Reply via email to