[
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
guojh updated HIVE-24467:
-------------------------
Description:
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter),
ConditionalTasks remove the tasks that not selected in parallel, because there
are thread safety issues, some task may not remove from the dependent task
tree. This is a very serious bug, which causes some stage task not trigger
execution.
In our production cluster, the query run three conditional task in parallel,
after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to
runnable list for his parent Stage-31 is not done. But Stage-31 should removed
for it not selected.
Stage dependencies is below:
{code:java}
STAGE DEPENDENCIES:
Stage-41 is a root stage
Stage-26 depends on stages: Stage-41
Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2
Stage-39 has a backup stage: Stage-2
Stage-23 depends on stages: Stage-39
Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23,
Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
Stage-5
Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
Stage-51 depends on stages: Stage-0
Stage-4
Stage-6
Stage-7 depends on stages: Stage-6
Stage-40 has a backup stage: Stage-2
Stage-24 depends on stages: Stage-40
Stage-2
Stage-44 is a root stage
Stage-30 depends on stages: Stage-44
Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43,
Stage-12
Stage-42 has a backup stage: Stage-12
Stage-27 depends on stages: Stage-42
Stage-43 has a backup stage: Stage-12
Stage-28 depends on stages: Stage-43
Stage-12
Stage-47 is a root stage
Stage-34 depends on stages: Stage-47
Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46,
Stage-16
Stage-45 has a backup stage: Stage-16
Stage-31 depends on stages: Stage-45
Stage-46 has a backup stage: Stage-16
Stage-32 depends on stages: Stage-46
Stage-16
Stage-50 is a root stage
Stage-38 depends on stages: Stage-50
Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49,
Stage-20
Stage-48 has a backup stage: Stage-20
Stage-35 depends on stages: Stage-48
Stage-49 has a backup stage: Stage-20
Stage-36 depends on stages: Stage-49
Stage-20
{code}
Stage tasks execute log is below, we can see Stage-33 is conditional task and
it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and
Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45
parent of Stage-3, So, Stage-31 should removed too.
{code:java}
2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 1 out of 17
2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-26:MAPRED] in parallel
2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 2 out of 17
2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-30:MAPRED] in parallel
2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 3 out of 17
2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-34:MAPRED] in parallel
2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 4 out of 17
2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-38:MAPRED] in parallel
2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
2020-12-03T01:10:34,946 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 5 out of 17
2020-12-03T01:10:34,947 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-16:MAPRED] in parallel
2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 6 out of 17
2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-12:MAPRED] in parallel
2020-12-03T01:10:34,949 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 7 out of 17
2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-20:MAPRED] in parallel
2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
2020-12-03T01:10:36,950 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Launching Job 8 out of 17
2020-12-03T01:10:36,951 INFO [HiveServer2-Background-Pool: Thread-87372]
ql.Driver: Starting task [Stage-2:MAPRED] in parallel
2020-12-01T22:20:17,774 INFO [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Miss stage: Stage-3for queryid
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Miss stage for queryid
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some Execute
Stage miss error
{code}
was:
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter),
ConditionalTasks remove the tasks that not selected in parallel, because there
are thread safety issues, some task may not remove from the dependent task
tree. This is a very serious bug, which causes some stage task not trigger
execution.
In our production cluster, the query run three conditional task in parallel,
after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to
runnable list for his parent Stage-31 is not done. But Stage-31 should removed
for it not selected.
{code:java}
// code placeholder
2020-12-01T22:18:13,764 INFO [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Starting task [Stage-38:MAPRED] in parallel
2020-12-01T22:19:01,766 INFO [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
2020-12-01T22:19:01,766 INFO [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
2020-12-01T22:19:01,766 INFO [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
2020-12-01T22:20:17,774 INFO [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Miss stage: Stage-3for queryid
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156]
ql.Driver: Miss stage for queryid
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some Execute
Stage miss error
{code}
> ConditionalTask remove tasks that not selected exists thread safety problem
> ---------------------------------------------------------------------------
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.3.4
> Reporter: guojh
> Priority: Major
>
> When hive execute jobs in parallel(control by “hive.exec.parallel”
> parameter), ConditionalTasks remove the tasks that not selected in parallel,
> because there are thread safety issues, some task may not remove from the
> dependent task tree. This is a very serious bug, which causes some stage task
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel,
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit
> to runnable list for his parent Stage-31 is not done. But Stage-31 should
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
> Stage-41 is a root stage
> Stage-26 depends on stages: Stage-41
> Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40,
> Stage-2
> Stage-39 has a backup stage: Stage-2
> Stage-23 depends on stages: Stage-39
> Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23,
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
> Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
> Stage-5
> Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
> Stage-51 depends on stages: Stage-0
> Stage-4
> Stage-6
> Stage-7 depends on stages: Stage-6
> Stage-40 has a backup stage: Stage-2
> Stage-24 depends on stages: Stage-40
> Stage-2
> Stage-44 is a root stage
> Stage-30 depends on stages: Stage-44
> Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43,
> Stage-12
> Stage-42 has a backup stage: Stage-12
> Stage-27 depends on stages: Stage-42
> Stage-43 has a backup stage: Stage-12
> Stage-28 depends on stages: Stage-43
> Stage-12
> Stage-47 is a root stage
> Stage-34 depends on stages: Stage-47
> Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46,
> Stage-16
> Stage-45 has a backup stage: Stage-16
> Stage-31 depends on stages: Stage-45
> Stage-46 has a backup stage: Stage-16
> Stage-32 depends on stages: Stage-46
> Stage-16
> Stage-50 is a root stage
> Stage-38 depends on stages: Stage-50
> Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49,
> Stage-20
> Stage-48 has a backup stage: Stage-20
> Stage-35 depends on stages: Stage-48
> Stage-49 has a backup stage: Stage-20
> Stage-36 depends on stages: Stage-49
> Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45
> and Stage-46 should remove from the dependent tree, Stage-31 is child of
> Stage-45 parent of Stage-3, So, Stage-31 should removed too.
> {code:java}
> 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
> 2020-12-03T01:10:34,946 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 5 out of 17
> 2020-12-03T01:10:34,947 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-16:MAPRED] in parallel
> 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 6 out of 17
> 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-12:MAPRED] in parallel
> 2020-12-03T01:10:34,949 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 7 out of 17
> 2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-20:MAPRED] in parallel
> 2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
> 2020-12-03T01:10:36,950 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Launching Job 8 out of 17
> 2020-12-03T01:10:36,951 INFO [HiveServer2-Background-Pool: Thread-87372]
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2020-12-01T22:20:17,774 INFO [HiveServer2-Background-Pool: Thread-233156]
> ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156]
> ql.Driver: Miss stage: Stage-3for queryid
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156]
> ql.Driver: Miss stage for queryid
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some
> Execute Stage miss error
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)