[jira] [Commented] (KUDU-2224) Kudu Partition Dynamic Creation on Insertion

Grant Henke (Jira) Wed, 28 Aug 2019 08:26:43 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917853#comment-16917853
 ]


Grant Henke commented on KUDU-2224:
-----------------------------------

I think the concept of "Interval partitioning" makes sense here and we can get 
inspiration from other databases which already have a similar feature. See 
Oracles interval partitioning for example: 
https://oracle-base.com/articles/11g/partitioning-enhancements-11gr1#interval_partitioning

> Kudu Partition Dynamic Creation on Insertion
> --------------------------------------------
>
>                 Key: KUDU-2224
>                 URL: https://issues.apache.org/jira/browse/KUDU-2224
>             Project: Kudu
>          Issue Type: New Feature
>    Affects Versions: 1.4.0
>            Reporter: Sailesh Patel
>            Assignee: HeLifu
>            Priority: Minor
>
> Option to specify a more simplistic directive for partitioning where by Kudu 
> will create partitions on the fly instead of manual intervention of creating 
> additional partitions as described in:
>   https://kudu.apache.org/2016/08/23/new-range-partitioning-features.html
>   
>   
> https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables
>        "Non-Covering Range Partitions"
>   
> +Requirement:+
>    When creating partitioning, a partitioning rule is specified, whereby the 
> granularity size is specified and a new partition  is created :
>     -at insert time when one does not exist for that value.
> e.g  proposal
> CREATE TABLE sample_table (ts TIMESTAMP, eventid BIGINT, somevalue STRING, 
> PRIMARY KEY(ts,eventid) )
> PARTITION BY 
> RANGE(ts) GRANULARITY= 86400000000000 START = 1104537600000000 
> STORED AS KUDU;
>    - Maybe an optional END
>    - The start is to show were there partition granularity builds from  
> -----    
> Use case
> - time series data where timestamps arrive out of order, can catch up from 
> sometimes years in the past and and for unpredictable timestamps. Event 
> information is either a timestamp (say epoch nano or epoch millisecond) with 
> partitions based upon a range value of that timestamp (typically day or hour 
> granularity)
> Currently, we script up the creation of partitions in advance of our received 
> data but if they fail for any reason the insert fails. Also, if we receive 
> unexpected data from a timestamp way in the past that if there is no 
> partition for the insert will fail.
> Opening this Jira enhancement for discussion.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (KUDU-2224) Kudu Partition Dynamic Creation on Insertion

Reply via email to