[ 
https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903692#action_12903692
 ] 

Joydeep Sen Sarma commented on HIVE-1602:
-----------------------------------------

yeah. but i have been asking how you are planning to make the grouping of 
partitioning transparent. to me that sounds like a very risky and big change 
and there are no details here.

why would we do this at hive layer given we have HAR already?

i really don't understand why we wouldn't start with hive-1467 and then add HAR 
as an optimization to reduce number of files for small partitions. this doesn't 
address the skew case. it doesn't address the fact that we still have to 
partition by dynamic partitioning columns - and that requires the same 
partition-only map-reduce operator that 1467 requires. at which point - we can 
just do 1467.

what am i missing?

> List Partitioning
> -----------------
>
>                 Key: HIVE-1602
>                 URL: https://issues.apache.org/jira/browse/HIVE-1602
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition 
> column values. Currently it creates one partition for each distinct DP column 
> value. This could result in skews in the created dynamic partitions in that 
> some partitions are large but there could be large number of small partitions 
> as well. This results in burdens in HDFS as well as metastore. A list 
> partitioning scheme that aggregate a number of small partitions into one big 
> one is more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to