[ https://issues.apache.org/jira/browse/HIVE-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962637#comment-13962637 ]
william zhu commented on HIVE-6831: ----------------------------------- The problem is ,the union operate will be resolved two steps(A,B). The two steps have one child steps(C) ,C will aggregate A and B steps. And in skew join condition task , c will be selected without any check that its parent steps(A,B) have been completed. > The job schedule in condition task could not be correct with skewed join > optimization > ------------------------------------------------------------------------------------- > > Key: HIVE-6831 > URL: https://issues.apache.org/jira/browse/HIVE-6831 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.11.0 > Environment: Hive 0.11.0 > Reporter: william zhu > Attachments: 6831.patch > > > Code snippet in ConditionalTask.java as bellow: > // resolved task > if (driverContext.addToRunnable(tsk)) { > console.printInfo(tsk.getId() + " is selected by condition > resolver."); > } > The selected task is added into the runnable queue immediately without any > dependency checking. If the selected task is original task ,and its parent > task is not being executed, then the result will be incorrect. > Like this: > 1. Before skew join optimization: > Step1 ,Step 2 <-- step 3 ( Step1 and Step2 is Step 3's parent) > 2. after skew join optimization: > Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10] > Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11] > 3. Runing > Step3 is selected in Step4 and Step5 > Step3 will be execute immediately after Step4 , its not correct. > Step3 will be execute after Step5 again, its not correct either. > 4. The correct scheduler is that step3 will be execute after step4 and step5. > 5. So, I add a checking operate in the snippet as bellow: > if (!driverContext.getRunnable().contains(tsk)) { > console.printInfo(tsk.getId() + " is selected by condition > resolver."); > if(DriverContext.isLaunchable(tsk)){ > driverContext.addToRunnable(tsk); > } > } > So , that is work right for me in my enviroment. I am not sure whether it > will has some problems in someother condition. -- This message was sent by Atlassian JIRA (v6.2#6252)