gianm commented on issue #11929: URL: https://github.com/apache/druid/issues/11929#issuecomment-1023654380
PR #12163 is adding the PARTITON BY / CLUSTER BY syntax. That's great. I'd like to propose some additional syntax for controlling replace vs append, like what we do with `intervals` and `appendToExisting` in ingestion specs today. The idea is based on Spark's `INSERT OVERWRITE`, which is a similar concept: https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-overwrite-table.html. However, I suggest a few changes for our situation: 1) REPLACE is already a keyword in our dialect, so use that instead of OVERWRITE to avoid adding a new keyword. 2) We only support time partitioning, so include some nice nods to that in the syntax. 3) We don't have "partition by" metadata on the table, so it'd have to be provided in the query. For example, this query would overwrite the entire datasource `tbl`: ``` INSERT INTO tbl REPLACE ALL SELECT … PARTITION BY DAY ``` This query would overwrite the day 2020-01-01: ``` INSERT INTO tbl REPLACE PARTITION TIMESTAMP '2020-01-01 00:00:00' -- SQL literal timestamp SELECT … PARTITION BY DAY ``` This query would overwrite the month 2020-01. Note that in this example, the partition interval spans multiple partitions. I think this is OK and will be useful for jobs that want to replace a bunch of time chunks all at once. I wish it would say PARTITIONS instead of PARTITION, but PARTITIONS isn't a keyword currently and I'd like to avoid adding a new one. ``` INSERT INTO tbl REPLACE PARTITION '2020-01/P1M' -- ISO8601 interval SELECT … PARTITION BY DAY ``` Validation conditions: 1. With `REPLACE PARTITION TIMESTAMP '<sql timestamp>'`, the timestamp must be aligned to the provided `PARTITION BY` (it must be the starting timestamp of a time chunk). 2. With `REPLACE PARTITION '<iso8601 interval>'`, the interval must be aligned to the provided `PARTITION BY`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
