[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

ASF subversion and git services (Jira) Tue, 20 Jul 2021 08:57:09 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384358#comment-17384358
 ]


ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 586b7913258df2d0ee75470ddfb2b88d472ba235 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=586b791 ]

[client] KUDU-2671 custom hash buckets API for table creation

This patch introduces changes in the Kudu C++ client API to make it
possible to create a Kudu table with custom hash bucket schemas per
range partition.

This patch doesn't contain the rest of functionality required to make
range partitions with custom hash bucket schemas fully functional. This
is rather a patch focusing on the API side only.  The missing pieces will
be addressed in follow-up patches:

  * update PartitionSchema to properly encode range keys in case
    where range partitions with custom hash bucket schemas are present
    (i.e. update PartitionSchema::EncodeKeyImpl() correspondingly)

  * update the meta-cache to work with partition range keys built for
    tables containing range partitions with custom hash bucket schema

  * update other places in client code which are dependent on
    PartitionPruner doing proper processing of partition key ranges for
    tables containing range partitions with custom hash bucket schema

  * add checks at the server side to check that the columns used for
    custom hash bucket schemas are part of the primary key

  * add provisions to allow for plain (i.e. without any hash
    sub-partitioning) custom range partitions for tables with
    table-wide hash bucket schema

  * add end-to-end tests to verify the proper distribution of inserted
    rows among range partitions and among their hash buckets

I also added test coverage to verify the newly introduced functionality
up to some extent, namely making sure the appropriate number of tablets
is created for tables with custom hash bucket schemas per range, adding
TODOs where full end-to-end coverage isn't yet available due to missing
functionality outlined above.

Change-Id: I98fd9754db850dcdd00a00738f470673f42ac5b4
Reviewed-on: http://gerrit.cloudera.org:8080/17657
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <[email protected]>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

Reply via email to