[ 
https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903672#action_12903672
 ] 

Joydeep Sen Sarma commented on HIVE-1602:
-----------------------------------------

> combining small partitions into one large partitions seems to be a natural 
> way.

sure - but i am worried that this is a fundamental change to hive's data model 
and may not be the quickest/safest solution to what is a pretty urgent problem.

also - HAR solves the small files packed into big file already. and it doesn't 
require changes to hive's data model. so in that sense it seems like an easy 
win.

u are still left with the problem of the large partition (skew) problem. this 
doesn't solve that either (assuming u are using reducers).

>  How can the user manually cluster event=s, event=m, event=l into one

insert overwrite table xxx partition (event_class) select a,b,c,event, 
case(event when 's' then 'sml' when 'm' then 'sml' when 'l' then 'sml' else 
'g') from ...

> List Partitioning
> -----------------
>
>                 Key: HIVE-1602
>                 URL: https://issues.apache.org/jira/browse/HIVE-1602
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition 
> column values. Currently it creates one partition for each distinct DP column 
> value. This could result in skews in the created dynamic partitions in that 
> some partitions are large but there could be large number of small partitions 
> as well. This results in burdens in HDFS as well as metastore. A list 
> partitioning scheme that aggregate a number of small partitions into one big 
> one is more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to