[
https://issues.apache.org/jira/browse/SPARK-25527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628392#comment-16628392
]
Ran Haim commented on SPARK-25527:
----------------------------------
Hi, actually as I said there is one task (in the last stage) that is waiting to
be started.
I do not know exactly what to look for in the thread dump, I uploaded it -
[^threaddumpjob.txt]
> Job stuck waiting for last stage to start
> -----------------------------------------
>
> Key: SPARK-25527
> URL: https://issues.apache.org/jira/browse/SPARK-25527
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Reporter: Ran Haim
> Priority: Major
> Attachments: threaddumpjob.txt
>
>
> Sometimes it can somehow happen that a job is stuck waiting for the last
> stage to start.
> There are no Tasks waiting for completion, and the job just hangs.
> There are available Executors for the job to run.
> I do not know how to reproduce this, all I know is that it happens randomly
> after couple days of hard load.
> Another thing that might help is that it seems to happen when some tasks fail
> because one or more executors killed (due to memory issues or something).
> Those tasks eventually do get finished by other executors because of retries,
> but the next stage hangs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]