[
https://issues.apache.org/jira/browse/HIVE-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962630#comment-13962630
]
william zhu commented on HIVE-6831:
-----------------------------------
1. The query can be simplied like this:
Select * from
(select * from TableA Union all select * from TableB) a
2. TableA and TableB is consists of other select query.
3. And I set the hive parameter is :
set hive.auto.convert.join=false;
set hive.optimize.skewjoin = true;
set hive.skewjoin.key = 500000;
set hive.mapjoin.smalltable.filesize=50000000;
> The job schedule in condition task could not be correct with skewed join
> optimization
> -------------------------------------------------------------------------------------
>
> Key: HIVE-6831
> URL: https://issues.apache.org/jira/browse/HIVE-6831
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.11.0
> Environment: Hive 0.11.0
> Reporter: william zhu
> Attachments: 6831.patch
>
>
> Code snippet in ConditionalTask.java as bellow:
> // resolved task
> if (driverContext.addToRunnable(tsk)) {
> console.printInfo(tsk.getId() + " is selected by condition
> resolver.");
> }
> The selected task is added into the runnable queue immediately without any
> dependency checking. If the selected task is original task ,and its parent
> task is not being executed, then the result will be incorrect.
> Like this:
> 1. Before skew join optimization:
> Step1 ,Step 2 <-- step 3 ( Step1 and Step2 is Step 3's parent)
> 2. after skew join optimization:
> Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10]
> Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11]
> 3. Runing
> Step3 is selected in Step4 and Step5
> Step3 will be execute immediately after Step4 , its not correct.
> Step3 will be execute after Step5 again, its not correct either.
> 4. The correct scheduler is that step3 will be execute after step4 and step5.
> 5. So, I add a checking operate in the snippet as bellow:
> if (!driverContext.getRunnable().contains(tsk)) {
> console.printInfo(tsk.getId() + " is selected by condition
> resolver.");
> if(DriverContext.isLaunchable(tsk)){
> driverContext.addToRunnable(tsk);
> }
> }
> So , that is work right for me in my enviroment. I am not sure whether it
> will has some problems in someother condition.
--
This message was sent by Atlassian JIRA
(v6.2#6252)