[ https://issues.apache.org/jira/browse/HIVE-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962630#comment-13962630 ]
william zhu commented on HIVE-6831: ----------------------------------- 1. The query can be simplied like this: Select * from (select * from TableA Union all select * from TableB) a 2. TableA and TableB is consists of other select query. 3. And I set the hive parameter is : set hive.auto.convert.join=false; set hive.optimize.skewjoin = true; set hive.skewjoin.key = 500000; set hive.mapjoin.smalltable.filesize=50000000; > The job schedule in condition task could not be correct with skewed join > optimization > ------------------------------------------------------------------------------------- > > Key: HIVE-6831 > URL: https://issues.apache.org/jira/browse/HIVE-6831 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.11.0 > Environment: Hive 0.11.0 > Reporter: william zhu > Attachments: 6831.patch > > > Code snippet in ConditionalTask.java as bellow: > // resolved task > if (driverContext.addToRunnable(tsk)) { > console.printInfo(tsk.getId() + " is selected by condition > resolver."); > } > The selected task is added into the runnable queue immediately without any > dependency checking. If the selected task is original task ,and its parent > task is not being executed, then the result will be incorrect. > Like this: > 1. Before skew join optimization: > Step1 ,Step 2 <-- step 3 ( Step1 and Step2 is Step 3's parent) > 2. after skew join optimization: > Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10] > Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11] > 3. Runing > Step3 is selected in Step4 and Step5 > Step3 will be execute immediately after Step4 , its not correct. > Step3 will be execute after Step5 again, its not correct either. > 4. The correct scheduler is that step3 will be execute after step4 and step5. > 5. So, I add a checking operate in the snippet as bellow: > if (!driverContext.getRunnable().contains(tsk)) { > console.printInfo(tsk.getId() + " is selected by condition > resolver."); > if(DriverContext.isLaunchable(tsk)){ > driverContext.addToRunnable(tsk); > } > } > So , that is work right for me in my enviroment. I am not sure whether it > will has some problems in someother condition. -- This message was sent by Atlassian JIRA (v6.2#6252)