[ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-24467:
---------------------------------

    Assignee: Xi Chen  (was: guojh)

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-24467
>                 URL: https://issues.apache.org/jira/browse/HIVE-24467
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.1.0, 2.3.4, 3.1.2
>            Reporter: guojh
>            Assignee: Xi Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
> 2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 5 out of 17
> 2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-16:MAPRED] in parallel
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 6 out of 17
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-12:MAPRED] in parallel
> 2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 7 out of 17
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-20:MAPRED] in parallel
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
> 2020-12-03T01:10:36,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 8 out of 17
> 2020-12-03T01:10:36,951  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2020-12-01T22:20:17,774  INFO [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage: Stage-3for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some 
> Execute Stage miss error
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to