[ 
https://issues.apache.org/jira/browse/HIVE-24467?focusedWorklogId=520885&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520885
 ]

ASF GitHub Bot logged work on HIVE-24467:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Dec/20 03:57
            Start Date: 07/Dec/20 03:57
    Worklog Time Spent: 10m 
      Work Description: gjhkael closed pull request #1743:
URL: https://github.com/apache/hive/pull/1743


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 520885)
    Time Spent: 40m  (was: 0.5h)

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-24467
>                 URL: https://issues.apache.org/jira/browse/HIVE-24467
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.3.4
>            Reporter: guojh
>            Assignee: guojh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
> 2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 5 out of 17
> 2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-16:MAPRED] in parallel
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 6 out of 17
> 2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-12:MAPRED] in parallel
> 2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 7 out of 17
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-20:MAPRED] in parallel
> 2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
> 2020-12-03T01:10:36,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 8 out of 17
> 2020-12-03T01:10:36,951  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-2:MAPRED] in parallel
> 2020-12-01T22:20:17,774  INFO [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage: Stage-3for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
> 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] 
> ql.Driver: Miss stage for queryid 
> hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some 
> Execute Stage miss error
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to