[ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384052#comment-17384052
 ] 

Xi Chen commented on HIVE-24467:
--------------------------------

We also came into this problem. And the result is lost data.

Our query is dynamic INSERT OVERWRITE with UNION ALL, in the form of:
{code:java}
INSERT OVERWRITE TABLE dest_table PARTITION(...)
SELECT ... FROM
(
  SELECT ... FROM table_a JOIN table_b ...
  UNION ALL
  SELECT ... FROM table_a WHERE ...
  UNION ALL
  SELECT ... FROM table_c JOIN table_d ...
  UNION ALL
  SELECT ... FROM table_c WHERE ...
) mid
JOIN table_e ...;{code}
The stage dependencies is:
{code:java}
STAGE DEPENDENCIES:
  Stage-5 is a root stage
  Stage-6 depends on stages: Stage-5
  Stage-22 depends on stages: Stage-6 , consists of Stage-26, Stage-1
  Stage-26 has a backup stage: Stage-1
  Stage-21 depends on stages: Stage-26
  Stage-25 depends on stages: Stage-1, Stage-12, Stage-21, Stage-23
  Stage-20 depends on stages: Stage-25
  Stage-0 depends on stages: Stage-20
  Stage-1
  Stage-14 is a root stage
  Stage-15 depends on stages: Stage-14
  Stage-24 depends on stages: Stage-15 , consists of Stage-27, Stage-12
  Stage-27 has a backup stage: Stage-12
  Stage-23 depends on stages: Stage-27
  Stage-12
{code}
The problem is triggered in this way:
 # Both Stage-22 and Stage-24 are ConditionalTask and contain mapjoin. 
 # Their dependent tasks Stage-6 and Stage-15 has similar input data size and 
finish at the same time. 
 # Thus the two ConditionalTask starts at the same time and come into this race 
condition, causing the backup stages Stage-1 and Stage-12 not correctly removed 
from Stage-25's dependency list. 
 # Then Stage-25 Stage-20 Stage-0 will not trigger. 
 # Stage-0 is a MoveTask so the data is totally lost and the query succeeds!

The last stdout of the hive query that lost data is:

!image-2021-07-20-18-22-20-218.png!
{code:java}
 {code}
While the normal output should be:

!image-2021-07-20-18-24-10-716.png!

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-24467
>                 URL: https://issues.apache.org/jira/browse/HIVE-24467
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.3.4
>            Reporter: guojh
>            Assignee: guojh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
> 2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 5 out of 17
> 2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-16:MAPRED] in parallel
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 6 out of 17
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-12:MAPRED] in parallel
> 2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 7 out of 17
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-20:MAPRED] in parallel
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
> 2020-12-03T01:10:36,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 8 out of 17
> 2020-12-03T01:10:36,951  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2020-12-01T22:20:17,774  INFO [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage: Stage-3for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some 
> Execute Stage miss error
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to