[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256030#comment-16256030
 ] 

Zhiyuan Yang edited comment on TEZ-3846 at 11/16/17 10:01 PM:
--------------------------------------------------------------

[~ewohlstadter] It's done in TEZ-3858.


was (Author: aplusplus):
[~EricWohlstadter] It's done in TEZ-3858.

> Tez AM may not clean up properly on an internal error
> -----------------------------------------------------
>
>                 Key: TEZ-3846
>                 URL: https://issues.apache.org/jira/browse/TEZ-3846
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger: </PERFLOG method=TezRunDag start=1506586032352 
> end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to