[
https://issues.apache.org/jira/browse/HIVE-28112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-28112:
--------------------------------
Description:
1. dag fails (dag_1709708735265_0007_94,
hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced)
{code}
<14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl"
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed.
FinalState=FAILED
{code}
2. AM lost plugin decides to re-execute:
{code}
<14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060
class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94"
level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message: AM
record not found (likely died) in zookeeper for application id:
application_1709708735265_0007 retryPossible: true
{code}
3. there are messages, that belong to a new execution (when there is no DAG at
all), still showing the last dagId, which is confusing, e.g:
{code}
<14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver"
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
{code}
while compiling a query, a dag id is not even supposed to be present, dag id is
got upon dag submission from the am
the new dag id will correspond to the same hive query id, so hive query id can
be used to keep the connection between query attempts
the last message, that makes sense for the failed/last dag id is:
{code}
<14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver"
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute
query
{code}
so we might want to delete the dagId from MDC/NDC around this point: "Preparing
to re-execute query"
was:
1. dag fails (dag_1709708735265_0007_94,
hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced)
{code}
<14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl"
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed.
FinalState=FAILED
{code}
2. AM lost plugin decides to re-execute:
{code}
<14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060
class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94"
level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message: AM
record not found (likely died) in zookeeper for application id:
application_1709708735265_0007 retryPossible: true
{code}
3. there are messages, that belong to a new execution (when there is no DAG at
all), still showing the last dagId, which is confusing, e.g:
{code}
<14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver"
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
{code}
while compiling a query, a dag id is not even supposed to be present, dag id is
got upon dag submission from the am
the new dag id will correspond to the same hive query id, so hive query id can
be used to keep the connection between query attempts
the last message, that makes sense for the last dagId is:
{code}
<14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver"
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute
query
{code}
so we might want to delete the dagId from MDC/NDC around this point: "Preparing
to re-execute query"
> Clear dagId from MDC/NDC when re-executing the query with new dagId
> -------------------------------------------------------------------
>
> Key: HIVE-28112
> URL: https://issues.apache.org/jira/browse/HIVE-28112
> Project: Hive
> Issue Type: Bug
> Reporter: László Bodor
> Priority: Major
>
> 1. dag fails (dag_1709708735265_0007_94,
> hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced)
> {code}
> <14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl"
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed.
> FinalState=FAILED
> {code}
> 2. AM lost plugin decides to re-execute:
> {code}
> <14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060
> class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94"
> level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message:
> AM record not found (likely died) in zookeeper for application id:
> application_1709708735265_0007 retryPossible: true
> {code}
> 3. there are messages, that belong to a new execution (when there is no DAG
> at all), still showing the last dagId, which is confusing, e.g:
> {code}
> <14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver"
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
> {code}
> while compiling a query, a dag id is not even supposed to be present, dag id
> is got upon dag submission from the am
> the new dag id will correspond to the same hive query id, so hive query id
> can be used to keep the connection between query attempts
> the last message, that makes sense for the failed/last dag id is:
> {code}
> <14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver"
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION"
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced"
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab"
> thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute
> query
> {code}
> so we might want to delete the dagId from MDC/NDC around this point:
> "Preparing to re-execute query"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)