[jira] [Commented] (HIVE-6831) The job schedule in condition task could not be correct with skewed join optimization

william zhu (JIRA) Mon, 07 Apr 2014 23:31:33 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962630#comment-13962630
 ]


william zhu commented on HIVE-6831:
-----------------------------------

1. The query can be simplied like this:
Select * from 
(select   *  from   TableA  Union all select * from TableB)  a
2.  TableA and TableB is consists of other select query.  
3.  And I set the hive parameter is : 
set hive.auto.convert.join=false;
set hive.optimize.skewjoin = true;
set hive.skewjoin.key = 500000;
set hive.mapjoin.smalltable.filesize=50000000;



> The job schedule in condition task could not be correct with skewed join 
> optimization
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-6831
>                 URL: https://issues.apache.org/jira/browse/HIVE-6831
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.11.0
>         Environment: Hive 0.11.0
>            Reporter: william zhu
>         Attachments: 6831.patch
>
>
> Code snippet in  ConditionalTask.java as bellow:  
>     // resolved task
>         if (driverContext.addToRunnable(tsk)) {
>           console.printInfo(tsk.getId() + " is selected by condition 
> resolver.");
>     }
> The selected task is added into the runnable queue immediately without any 
> dependency checking. If the selected task is original task ,and its parent 
> task is not being executed, then the result will be incorrect.
> Like this:
> 1. Before skew join optimization:
> Step1 ,Step 2 <-- step 3   ( Step1 and Step2 is Step 3's parent)
> 2. after skew join optimization:
> Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10]    
> Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11]
> 3. Runing
> Step3 is selected in Step4 and Step5
> Step3 will be execute immediately after Step4 , its not correct.
> Step3 will be execute after Step5 again, its not correct either.
> 4. The correct scheduler is that step3 will be execute after step4 and step5.
> 5. So, I add a checking operate in the snippet  as bellow:
>         if (!driverContext.getRunnable().contains(tsk)) {
>           console.printInfo(tsk.getId() + " is selected by condition 
> resolver.");
>           if(DriverContext.isLaunchable(tsk)){
>                 driverContext.addToRunnable(tsk);
>           }
>         }
> So , that is work right for me in my enviroment. I am not sure whether it 
> will has some problems  in someother condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6831) The job schedule in condition task could not be correct with skewed join optimization

Reply via email to