[ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guojh updated HIVE-24467:
-------------------------
    Description: 
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), 
ConditionalTasks  remove the tasks that not selected in parallel, because there 
are thread safety issues, some task may not remove from the dependent task 
tree. This is a very serious bug, which causes some stage task not trigger 
execution.

In our production cluster, the query run three conditional task in parallel, 
after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to 
runnable list for his parent Stage-31 is not done. But Stage-31 should removed 
for it not selected.

Stage dependencies is below:
{code:java}
STAGE DEPENDENCIES:
  Stage-41 is a root stage
  Stage-26 depends on stages: Stage-41
  Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2
  Stage-39 has a backup stage: Stage-2
  Stage-23 depends on stages: Stage-39
  Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
  Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
  Stage-5
  Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
  Stage-51 depends on stages: Stage-0
  Stage-4
  Stage-6
  Stage-7 depends on stages: Stage-6
  Stage-40 has a backup stage: Stage-2
  Stage-24 depends on stages: Stage-40
  Stage-2
  Stage-44 is a root stage
  Stage-30 depends on stages: Stage-44
  Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
Stage-12
  Stage-42 has a backup stage: Stage-12
  Stage-27 depends on stages: Stage-42
  Stage-43 has a backup stage: Stage-12
  Stage-28 depends on stages: Stage-43
  Stage-12
  Stage-47 is a root stage
  Stage-34 depends on stages: Stage-47
  Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
Stage-16
  Stage-45 has a backup stage: Stage-16
  Stage-31 depends on stages: Stage-45
  Stage-46 has a backup stage: Stage-16
  Stage-32 depends on stages: Stage-46
  Stage-16
  Stage-50 is a root stage
  Stage-38 depends on stages: Stage-50
  Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
Stage-20
  Stage-48 has a backup stage: Stage-20
  Stage-35 depends on stages: Stage-48
  Stage-49 has a backup stage: Stage-20
  Stage-36 depends on stages: Stage-49
  Stage-20
{code}
Stage tasks execute log is below, we can see Stage-33 is conditional task and 
it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and 
Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45 
parent of Stage-3, So, Stage-31 should removed too.
{code:java}
2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 1 out of 17
2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-26:MAPRED] in parallel
2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 2 out of 17
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-30:MAPRED] in parallel
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 3 out of 17
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-34:MAPRED] in parallel
2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 4 out of 17
2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-38:MAPRED] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 5 out of 17
2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-16:MAPRED] in parallel
2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 6 out of 17
2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-12:MAPRED] in parallel
2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 7 out of 17
2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-20:MAPRED] in parallel
2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
2020-12-03T01:10:36,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 8 out of 17
2020-12-03T01:10:36,951  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-2:MAPRED] in parallel

2020-12-01T22:20:17,774  INFO [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Miss stage: Stage-3for queryid 
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Miss stage for queryid 
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some Execute 
Stage miss error
{code}
 

  was:
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), 
ConditionalTasks  remove the tasks that not selected in parallel, because there 
are thread safety issues, some task may not remove from the dependent task 
tree. This is a very serious bug, which causes some stage task not trigger 
execution.

In our production cluster, the query run three conditional task in parallel, 
after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to 
runnable list for his parent Stage-31 is not done. But Stage-31 should removed 
for it not selected.
{code:java}
// code placeholder
2020-12-01T22:18:13,764 INFO [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Starting task [Stage-38:MAPRED] in parallel
2020-12-01T22:19:01,766 INFO [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
2020-12-01T22:19:01,766 INFO [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
2020-12-01T22:19:01,766 INFO [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel

2020-12-01T22:20:17,774  INFO [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Miss stage: Stage-3for queryid 
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
ql.Driver: Miss stage for queryid 
hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some Execute 
Stage miss error
{code}
 


> ConditionalTask remove tasks that not selected exists thread safety problem
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-24467
>                 URL: https://issues.apache.org/jira/browse/HIVE-24467
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.3.4
>            Reporter: guojh
>            Priority: Major
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
> 2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 5 out of 17
> 2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-16:MAPRED] in parallel
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 6 out of 17
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-12:MAPRED] in parallel
> 2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 7 out of 17
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-20:MAPRED] in parallel
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
> 2020-12-03T01:10:36,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 8 out of 17
> 2020-12-03T01:10:36,951  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2020-12-01T22:20:17,774  INFO [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage: Stage-3for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some 
> Execute Stage miss error
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to