[
https://issues.apache.org/jira/browse/KUDU-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917853#comment-16917853
]
Grant Henke commented on KUDU-2224:
-----------------------------------
I think the concept of "Interval partitioning" makes sense here and we can get
inspiration from other databases which already have a similar feature. See
Oracles interval partitioning for example:
https://oracle-base.com/articles/11g/partitioning-enhancements-11gr1#interval_partitioning
> Kudu Partition Dynamic Creation on Insertion
> --------------------------------------------
>
> Key: KUDU-2224
> URL: https://issues.apache.org/jira/browse/KUDU-2224
> Project: Kudu
> Issue Type: New Feature
> Affects Versions: 1.4.0
> Reporter: Sailesh Patel
> Assignee: HeLifu
> Priority: Minor
>
> Option to specify a more simplistic directive for partitioning where by Kudu
> will create partitions on the fly instead of manual intervention of creating
> additional partitions as described in:
> https://kudu.apache.org/2016/08/23/new-range-partitioning-features.html
>
>
> https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables
> "Non-Covering Range Partitions"
>
> +Requirement:+
> When creating partitioning, a partitioning rule is specified, whereby the
> granularity size is specified and a new partition is created :
> -at insert time when one does not exist for that value.
> e.g proposal
> CREATE TABLE sample_table (ts TIMESTAMP, eventid BIGINT, somevalue STRING,
> PRIMARY KEY(ts,eventid) )
> PARTITION BY
> RANGE(ts) GRANULARITY= 86400000000000 START = 1104537600000000
> STORED AS KUDU;
> - Maybe an optional END
> - The start is to show were there partition granularity builds from
> -----
> Use case
> - time series data where timestamps arrive out of order, can catch up from
> sometimes years in the past and and for unpredictable timestamps. Event
> information is either a timestamp (say epoch nano or epoch millisecond) with
> partitions based upon a range value of that timestamp (typically day or hour
> granularity)
> Currently, we script up the creation of partitions in advance of our received
> data but if they fail for any reason the insert fails. Also, if we receive
> unexpected data from a timestamp way in the past that if there is no
> partition for the insert will fail.
> Opening this Jira enhancement for discussion.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)