[
https://issues.apache.org/jira/browse/HIVE-28112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor reassigned HIVE-28112:
-----------------------------------
Assignee: László Bodor
> Clear dagId from MDC/NDC when re-executing the query with new dagId
> -------------------------------------------------------------------
>
> Key: HIVE-28112
> URL: https://issues.apache.org/jira/browse/HIVE-28112
> Project: Hive
> Issue Type: Bug
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> 1. dag fails (dag_1709708735265_0007_94,
> hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced)
> {code}
> <14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl"
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed.
> FinalState=FAILED
> {code}
> 2. AM lost plugin decides to re-execute:
> {code}
> <14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060
> class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94"
> level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message:
> AM record not found (likely died) in zookeeper for application id:
> application_1709708735265_0007 retryPossible: true
> {code}
> 3. there are messages, that belong to a new execution (when there is no DAG
> at all), still showing the last dagId, which is confusing, e.g:
> {code}
> <14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver"
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
> {code}
> while compiling a query, a dag id is not even supposed to be present, dag id
> is got upon dag submission from the am
> the new dag id will correspond to the same hive query id, so hive query id
> can be used to keep the connection between query attempts
> the last message, that makes sense for the failed/last dag id is:
> {code}
> <14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver"
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute
> query
> {code}
> so we might want to delete the dagId from MDC/NDC around this point:
> "Preparing to re-execute query"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)