[ 
https://issues.apache.org/jira/browse/HIVE-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Végh updated HIVE-26674:
-------------------------------
    Description: A new compaction type is required for implicitly bucketed 
tables. These tables can have balancing issues over time, in a way that the 
first few buckets contain the majority of the data, while the buckets with 
higher index contain less and less data. As a result, query performance will 
drop over time on these unbalanced tables. To solve this issue, the data 
periodically needs to be re-balanced among the buckets. The plain is to do this 
via a new RE-BALANCING compaction. This compaction can be issued either 
manually by users, or automatically by the Initiator. The automatic 
re-balancing compaction must be based on evaluating a set of thresholds.

> REBALANCE type compaction
> -------------------------
>
>                 Key: HIVE-26674
>                 URL: https://issues.apache.org/jira/browse/HIVE-26674
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Végh
>            Assignee: László Végh
>            Priority: Major
>
> A new compaction type is required for implicitly bucketed tables. These 
> tables can have balancing issues over time, in a way that the first few 
> buckets contain the majority of the data, while the buckets with higher index 
> contain less and less data. As a result, query performance will drop over 
> time on these unbalanced tables. To solve this issue, the data periodically 
> needs to be re-balanced among the buckets. The plain is to do this via a new 
> RE-BALANCING compaction. This compaction can be issued either manually by 
> users, or automatically by the Initiator. The automatic re-balancing 
> compaction must be based on evaluating a set of thresholds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to