Sailesh Patel created KUDU-2224: ----------------------------------- Summary: Kudu Partition Dynamic Creation on Insertion Key: KUDU-2224 URL: https://issues.apache.org/jira/browse/KUDU-2224 Project: Kudu Issue Type: New Feature Affects Versions: 1.4.0 Reporter: Sailesh Patel Priority: Minor
Option to specify a more simplistic directive for partitioning where by Kudu will create partitions on the fly instead of manual intervention of creating additional partitions as described in: https://kudu.apache.org/2016/08/23/new-range-partitioning-features.html https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables "Non-Covering Range Partitions" +Requirement:+ When creating partitioning, a partitioning rule is specified, whereby the granularity size is specified and a new partition is created : -at insert time when one does not exist for that value. e.g proposal CREATE TABLE sample_table (ts TIMESTAMP, eventid BIGINT, somevalue STRING, PRIMARY KEY(ts,eventid) ) PARTITION BY RANGE(ts) GRANULARITY= 86400000000000 START = 1104537600000000 STORED AS KUDU; - Maybe an optional END - The start is to show were there partition granularity builds from ----- Use case - time series data where timestamps arrive out of order, can catch up from sometimes years in the past and and for unpredictable timestamps. Event information is either a timestamp (say epoch nano or epoch millisecond) with partitions based upon a range value of that timestamp (typically day or hour granularity) Currently, we script up the creation of partitions in advance of our received data but if they fail for any reason the insert fails. Also, if we receive unexpected data from a timestamp way in the past that if there is no partition for the insert will fail. Opening this Jira enhancement for discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)