[GitHub] [druid] FrankChen021 commented on issue #11929: Batch ingestion using SQL INSERT

GitBox Thu, 18 Nov 2021 23:16:17 -0800


FrankChen021 commented on issue #11929:
URL: https://github.com/apache/druid/issues/11929#issuecomment-973813297



   > With all this in mind it seems like the most conventional language would 
be PARTITION BY for segment granularity and CLUSTER BY for secondary 
partitioning. Meaning the query would look like:
   > 
   > ```sql
   > INSERT INTO tbl
   > SELECT ...
   > FROM ...
   > PARTITION BY FLOOR(__time TO DAY)
   > CLUSTER BY channel
   > ```
   > 
   > I think there is some risk here of confusion with "PARTITION BY" vs. 
Druid's "partitionsSpec" ingestion config, which also uses the word "partition" 
but refers more to the "clustering" concept. But I could believe this is fine 
for the sake of having the SQL language be more aligned with other DBs.
   > 
   > I'm ok with going with this language. What do people think?
   
   I think these two key words have very clear semantics here. It's pretty much 
intuitive for users to understand their meanings.
   And they naturally reflect Druid's segment granularity and partitionSpec 
concept.
   
   ClickHouse's PARTITION BY supports both of these two semantics (as below), 
but in most user cases that I know, users prefer to partition data by date 
which is similar to Druid's segment granularity. 
   
   ```sql
   CREATE TABLE table_with_where
   (
       d DateTime,
       a Int
   )
   ENGINE = MergeTree
   PARTITION BY toYYYYMM(d)
   
   
   CREATE TABLE table_for_recompression
   (
       d DateTime,
       key UInt64,
       value String
   ) ENGINE MergeTree()
   PARTITION BY key
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] FrankChen021 commented on issue #11929: Batch ingestion using SQL INSERT

Reply via email to