[
https://issues.apache.org/jira/browse/SPARK-19764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889067#comment-15889067
]
Ari Gesher edited comment on SPARK-19764 at 2/28/17 11:04 PM:
--------------------------------------------------------------
That was the log in the application directory on the driver machine.
The other log was from the SPARK_LOG_DIR on one of the workers (*.112, as
referenced in the log snippets and WebUI in the body of the comments)
We're working on a repro to get you a stack trace.
was (Author: agesher):
That was the log in the application directory on the driver machine.
We're working on a repro to get you a stack trace.
> Executors hang with supposedly running task that are really finished.
> ---------------------------------------------------------------------
>
> Key: SPARK-19764
> URL: https://issues.apache.org/jira/browse/SPARK-19764
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Spark Core
> Affects Versions: 2.0.2
> Environment: Ubuntu 16.04 LTS
> OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
> Spark 2.0.2 - Spark Cluster Manager
> Reporter: Ari Gesher
> Attachments: driver-log-stderr.log, executor-2.log
>
>
> We've come across a job that won't finish. Running on a six-node cluster,
> each of the executors end up with 5-7 tasks that are never marked as
> completed.
> Here's an excerpt from the web UI:
> ||Index ▴||ID||Attempt||Status||Locality Level||Executor ID / Host||Launch
> Time||Duration||Scheduler Delay||Task Deserialization Time||GC Time||Result
> Serialization Time||Getting Result Time||Peak Execution Memory||Shuffle Read
> Size / Records||Errors||
> |105 | 1131 | 0 | SUCCESS |PROCESS_LOCAL |4 / 172.31.24.171 |
> 2017/02/27 22:51:36 | 1.9 min | 9 ms | 4 ms | 0.7 s | 2 ms| 6 ms|
> 384.1 MB| 90.3 MB / 572 | |
> |106| 1168| 0| RUNNING |ANY| 2 / 172.31.16.112| 2017/02/27
> 22:53:25| 6.5 h |0 ms| 0 ms| 1 s |0 ms| 0 ms| |384.1 MB
> |98.7 MB / 624 | |
> However, the Executor reports the task as finished:
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168).
> 2633558 bytes result sent via BlockManager)
> {noformat}
> As does the driver log:
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168).
> 2633558 bytes result sent via BlockManager)
> {noformat}
> Full log from this executor and the {{stderr}} from
> {{app-20170227223614-0001/2/stderr}} attached.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]