[
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555131#comment-17555131
]
ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------
Commit 6909ee4f800da192b72e59680916e5004527b6db in kudu's branch
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=6909ee4f8 ]
KUDU-2671 update partition schema in catalog when adding range
When adding a range with custom hash schema to a table, it's necessary
to update the partition schema information stored in the system catalog
correspondingly. That was missing in one of the previous patches and
this patch addresses the issue.
This patch also adds a test scenario to spot regressions, if any. The
scenario was failing before the update in CatalogManager introduced
in this patch. I also addressed nits pointed to by the TidyBot.
This is a follow-up to 250eb90bc0e1f4f472f44de8a23ce213595d5ee7.
Change-Id: I869458fb8bcb06801b54f2b4869e7826322563e0
Reviewed-on: http://gerrit.cloudera.org:8080/18615
Tested-by: Kudu Jenkins
Reviewed-by: Mahesh Reddy <[email protected]>
Reviewed-by: Attila Bukor <[email protected]>
> Change hash number for range partitioning
> -----------------------------------------
>
> Key: KUDU-2671
> URL: https://issues.apache.org/jira/browse/KUDU-2671
> Project: Kudu
> Issue Type: Improvement
> Components: client, java, master, server
> Affects Versions: 1.8.0
> Reporter: yangz
> Assignee: Mahesh Reddy
> Priority: Major
> Labels: feature, roadmap-candidate, scalability
> Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for
> some other day it will be 500G. For this case, it be hard to set the hash
> schema. If too big, for most case, it will be too wasteful. But too small,
> there is a performance problem in the case of a large amount of data.
>
> So we suggest a solution we can change the hash number by the history data of
> a table.
> for example
> # we create schema with one estimated value.
> # we collect the data size by day range
> # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature
> will be useful for the community. Maybe the solution isn't so complete.
> Please help us make it better.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)