gianm commented on issue #11929: URL: https://github.com/apache/druid/issues/11929#issuecomment-1026515751
This part: > Thus: > > ``` > INSERT INTO table > SELECT ... > PARTITION BY DAY, client, region ORDER BY order_id, line_item > ``` > > In the above, the total sort order is (__time, client, region, order_id, line_item). Makes me a little nervous because it suggests that the "PARTITION BY" will give you a total sorting for the table. But it might not, if the data is inserted in a sequence of INSERT calls. Each one will individually be sorted but there won't be a global sort until a compaction runs. However, there _will_ still be a global time-partition-key-sort (a.k.a. segment-granular sort), because we _strictly_ partition segments by the time partition key. There is exactly one time partition key (segment-granular timestamp) per segment, so we can always get a total ordering there. Not so for the secondary partitioning: because we have a variety of key values per segment, appending more segments can result in a non-totally-ordered situation. So that's another big difference between the time partitioning and the secondary partitioning. The mixing of keywords and expressions is also a little weird… This reflection is making me think that time partitioning is truly and overwhelmingly special, and so it really is a good idea to have the keywords be different. Which takes us back to CLUSTER BY (or CLUSTERED BY) being separate. Personally, after this reflection I'm a fan of: - PARTITIONED BY is _required_ and can either be a time floor function like `PARTITION BY FLOOR(__time TO DAY)` or `PARTITION BY TIME_FLOOR(__time, 'PT1H')`, a time unit like `PARTITION BY HOUR`, `PARTITION BY DAY`, `PARTITION BY MONTH`, or `PARTITION BY YEAR`; or the special phrase `PARTITION BY ALL TIME`. If you use the expression form, and provide an expression that is not a recognizable time floor expression, it's an error. - CLUSTERED BY is an _optional_ list of normal SQL expressions, and whatever you provide there, we'll use for secondary partitioning. If you provide nothing, we'll still do some kind of secondary partitioning, but it's up to us how we're going to do it. If there's a hypothetical future world where time partitioning isn't as special as it is today, we could loosen some of this stuff up, and perhaps move towards only using one clause. But we don't have plans right now to loosen this up, so I think it makes sense for the syntax to reflect the specialness of time. ---- As to how to replace time chunks, I kind of like using the verb REPLACE instead of INSERT. It makes sense to me to have append and overwrite both be top-level verbs. The way they work is very different, so the high-level distinction is good. There's also precedent in MySQL: https://dev.mysql.com/doc/refman/5.7/en/replace.html This version appeals to me, which is similar to MySQL's version: ``` REPLACE [INTO] table FOR <partitionSpec | (partitionSpec, partitionSpec, ...)> [insert-style column list] SELECT ... PARTITION BY ... ``` Where partitionSpec is one of: - PARTITION followed by SQL literal timestamp, like `PARTITION TIMESTAMP '2000-01-01 00:00:00'`: refers to the partition that starts with this timestamp. Must be aligned to the query's PARTITION BY. - PARTITION followed by literal string, like `PARTITION '2000-01-01/P1M'`: interpreted as an ISO8601 interval. Must be aligned to the query's PARTITION BY. - Keyword `ALL TIME`. Causes the whole table to be replaced. And the "FOR clause" is required. (I'm worried that making it optional will encourage accidental whole-table-replacement.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
