[ 
https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197341#comment-17197341
 ] 

wuyi commented on SPARK-32898:
------------------------------

I think the issue is(for executorRunTimeMs): Before a task reaches to 
"taskStartTimeNs = System.nanoTime()", it might be already killed(e.g., by 
another successful attempt).  So, taskStartTimeNs can not get initialized and 
remains 0. However, the executorRunTimeMs is calculated by "System.nanoTime() - 
taskStartTimeNs" in collectAccumulatorsAndResetStatusOnFailure, which is 
obviously a wrong big result when taskStartTimeNs = 0.

 

I haven't taken a detail look for the submissionTime, but it sounds like it's a 
different issue? Though, it may be due to the same logic hole.

 

I'd like to make a fix for the executorRunTimeMs first if [~linhongliu-db] 
doesn't mind.

> totalExecutorRunTimeMs is too big
> ---------------------------------
>
>                 Key: SPARK-32898
>                 URL: https://issues.apache.org/jira/browse/SPARK-32898
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Linhong Liu
>            Priority: Major
>
> This might be because of incorrectly calculating executorRunTimeMs in 
> Executor.scala
>  The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can 
> be called when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]
>  
> There is a throw exception before this line. The catch branch still updates 
> the metric.
>  However the query shows as SUCCESSful. Maybe this task is speculative. Not 
> sure.
>  
> submissionTime in LiveExecutionData may also have similar problem.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to