gianm commented on issue #11929:
URL: https://github.com/apache/druid/issues/11929#issuecomment-1026515751


   This part:
   
   > Thus:
   > 
   > ```
   > INSERT INTO table
   > SELECT ...
   > PARTITION BY DAY, client, region ORDER BY order_id, line_item
   > ```
   > 
   > In the above, the total sort order is (__time, client, region, order_id, 
line_item).
   
   Makes me a little nervous because it suggests that the "PARTITION BY" will 
give you a total sorting for the table. But it might not, if the data is 
inserted in a sequence of INSERT calls. Each one will individually be sorted 
but there won't be a global sort until a compaction runs.
   
   However, there _will_ still be a global time-partition-key-sort (a.k.a. 
segment-granular sort), because we _strictly_ partition segments by the time 
partition key. There is exactly one time partition key (segment-granular 
timestamp) per segment, so we can always get a total ordering there. Not so for 
the secondary partitioning: because we have a variety of key values per 
segment, appending more segments can result in a non-totally-ordered situation. 
So that's another big difference between the time partitioning and the 
secondary partitioning.
   
   The mixing of keywords and expressions is also a little weird…
   
   This reflection is making me think that time partitioning is truly and 
overwhelmingly special, and so it really is a good idea to have the keywords be 
different. Which takes us back to CLUSTER BY (or CLUSTERED BY) being separate.
   
   Personally, after this reflection I'm a fan of:
   
   - PARTITIONED BY is _required_ and can either be a time floor function like 
`PARTITION BY FLOOR(__time TO DAY)` or `PARTITION BY TIME_FLOOR(__time, 
'PT1H')`, a time unit like `PARTITION BY HOUR`, `PARTITION BY DAY`, `PARTITION 
BY MONTH`, or `PARTITION BY YEAR`; or the special phrase `PARTITION BY ALL 
TIME`. If you use the expression form, and provide an expression that is not a 
recognizable time floor expression, it's an error.
   - CLUSTERED BY is an _optional_ list of normal SQL expressions, and whatever 
you provide there, we'll use for secondary partitioning. If you provide 
nothing, we'll still do some kind of secondary partitioning, but it's up to us 
how we're going to do it.
   
   If there's a hypothetical future world where time partitioning isn't as 
special as it is today, we could loosen some of this stuff up, and perhaps move 
towards only using one clause. But we don't have plans right now to loosen this 
up, so I think it makes sense for the syntax to reflect the specialness of time.
   
   ----
   
   As to how to replace time chunks, I kind of like using the verb REPLACE 
instead of INSERT. It makes sense to me to have append and overwrite both be 
top-level verbs. The way they work is very different, so the high-level 
distinction is good. There's also precedent in MySQL: 
https://dev.mysql.com/doc/refman/5.7/en/replace.html
   
   This version appeals to me, which is similar to MySQL's version:
   
   ```
   REPLACE
   [INTO] table
   FOR <partitionSpec | (partitionSpec, partitionSpec, ...)>
   [insert-style column list]
   SELECT ...
   PARTITION BY ...
   ```
   
   Where partitionSpec is one of:
   
   - PARTITION followed by SQL literal timestamp, like `PARTITION TIMESTAMP 
'2000-01-01 00:00:00'`: refers to the partition that starts with this 
timestamp. Must be aligned to the query's PARTITION BY.
   - PARTITION followed by literal string, like `PARTITION '2000-01-01/P1M'`: 
interpreted as an ISO8601 interval. Must be aligned to the query's PARTITION BY.
   - Keyword `ALL TIME`. Causes the whole table to be replaced.
   
   And the "FOR clause" is required. (I'm worried that making it optional will 
encourage accidental whole-table-replacement.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to