liyunzhang_intel created HIVE-17287:

             Summary: HoS can not deal with skewed data group by
                 Key: HIVE-17287
             Project: Hive
          Issue Type: Bug
            Reporter: liyunzhang_intel

 fact table {{store_sales}} joins with small tables {{date_dim}}, 
{{item}},{{store}}. After join, groupby the intermediate data.
Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
partitions. The biggest partition is 25.7G and others are 715M.
hadoop fs -du -h 
715.0 M  
713.9 M  
714.1 M  
712.9 M  
25.7 G   
The skewed table {{store_sales}} caused the failed job. Is there any way to 
solve the groupby problem of skewed table?  I tried to enable 
{{hive.groupby.skewindata}} to first divide the data more evenly then start do 
group by. But the job still hangs. 

This message was sent by Atlassian JIRA

Reply via email to