[ 
https://issues.apache.org/jira/browse/HIVE-28112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28112:
--------------------------------
    Description: 
1. dag fails (dag_1709708735265_0007_94, 
hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced)
{code}
<14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl" 
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed. 
FinalState=FAILED
{code}

2. AM lost plugin decides to re-execute:
{code}
<14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 
class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94" 
level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message: AM 
record not found (likely died) in zookeeper for application id: 
application_1709708735265_0007 retryPossible: true
{code}

3. there are messages, that belong to a new execution (when there is no DAG at 
all), still showing the last dagId, which is confusing, e.g:
{code}
<14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver" 
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
{code}
while compiling a query, a dag id is not even supposed to be present, dag id is 
got upon dag submission from the am
the new dag id will correspond to the same hive query id, so hive query id can 
be used to keep the connection between query attempts

even more confusing when we submitted the new dag (dag_1709708735265_0047_41), 
and still the old dag id (dag_1709708735265_0007_94) is shown:
{code}
<14>1 2024-03-07T16:29:55.083Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="tez.TezTask" 
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] HS2 Host: 
[hiveserver2-0], Query ID: 
[hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced], Dag ID: 
[dag_1709708735265_0047_41], DAG Session ID: [application_1709708735265_0047]
{code}

the last message, that makes sense for the failed/last dag id is:
{code}
<14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver" 
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute 
query
{code} 
so we might want to delete the dagId from MDC/NDC around this point: "Preparing 
to re-execute query"


  was:
1. dag fails (dag_1709708735265_0007_94, 
hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced)
{code}
<14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl" 
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed. 
FinalState=FAILED
{code}

2. AM lost plugin decides to re-execute:
{code}
<14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 
class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94" 
level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message: AM 
record not found (likely died) in zookeeper for application id: 
application_1709708735265_0007 retryPossible: true
{code}

3. there are messages, that belong to a new execution (when there is no DAG at 
all), still showing the last dagId, which is confusing, e.g:
{code}
<14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver" 
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
{code}
while compiling a query, a dag id is not even supposed to be present, dag id is 
got upon dag submission from the am
the new dag id will correspond to the same hive query id, so hive query id can 
be used to keep the connection between query attempts

the last message, that makes sense for the failed/last dag id is:
{code}
<14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1 
87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver" 
dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute 
query
{code} 
so we might want to delete the dagId from MDC/NDC around this point: "Preparing 
to re-execute query"



> Clear dagId from MDC/NDC when re-executing the query with new dagId
> -------------------------------------------------------------------
>
>                 Key: HIVE-28112
>                 URL: https://issues.apache.org/jira/browse/HIVE-28112
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> 1. dag fails (dag_1709708735265_0007_94, 
> hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced)
> {code}
> <14>1 2024-03-07T16:29:54.292Z hiveserver2-0 hiveserver2 1 
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="client.DAGClientImpl" 
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
> thread="HiveServer2-Background-Pool: Thread-1432633"] DAG completed. 
> FinalState=FAILED
> {code}
> 2. AM lost plugin decides to re-execute:
> {code}
> <14>1 2024-03-07T16:29:54.301Z hiveserver2-0 hiveserver2 1 
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 
> class="reexec.ReExecuteLostAMQueryPlugin" dagId="dag_1709708735265_0007_94" 
> level="INFO" operationLogLevel="EXECUTION" 
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
> thread="HiveServer2-Background-Pool: Thread-1432633"] Got exception message: 
> AM record not found (likely died) in zookeeper for application id: 
> application_1709708735265_0007 retryPossible: true
> {code}
> 3. there are messages, that belong to a new execution (when there is no DAG 
> at all), still showing the last dagId, which is confusing, e.g:
> {code}
> <14>1 2024-03-07T16:29:54.348Z hiveserver2-0 hiveserver2 1 
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="ql.Driver" 
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
> thread="HiveServer2-Background-Pool: Thread-1432633"] Compiling command(...
> {code}
> while compiling a query, a dag id is not even supposed to be present, dag id 
> is got upon dag submission from the am
> the new dag id will correspond to the same hive query id, so hive query id 
> can be used to keep the connection between query attempts
> even more confusing when we submitted the new dag 
> (dag_1709708735265_0047_41), and still the old dag id 
> (dag_1709708735265_0007_94) is shown:
> {code}
> <14>1 2024-03-07T16:29:55.083Z hiveserver2-0 hiveserver2 1 
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="tez.TezTask" 
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
> thread="HiveServer2-Background-Pool: Thread-1432633"] HS2 Host: 
> [hiveserver2-0], Query ID: 
> [hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced], Dag ID: 
> [dag_1709708735265_0047_41], DAG Session ID: [application_1709708735265_0047]
> {code}
> the last message, that makes sense for the failed/last dag id is:
> {code}
> <14>1 2024-03-07T16:29:54.309Z hiveserver2-0 hiveserver2 1 
> 87c9f023-150f-4923-b044-3b4207506951 [mdc@18060 class="reexec.ReExecDriver" 
> dagId="dag_1709708735265_0007_94" level="INFO" operationLogLevel="EXECUTION" 
> queryId="hive_20240307162529_5ae040ec-7d46-4d79-9730-f4cd3f184ced" 
> sessionId="0ac1f6bf-98ea-4442-bb38-f5da5a36aeab" 
> thread="HiveServer2-Background-Pool: Thread-1432633"] Preparing to re-execute 
> query
> {code} 
> so we might want to delete the dagId from MDC/NDC around this point: 
> "Preparing to re-execute query"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to