FrankChen021 commented on issue #11929:
URL: https://github.com/apache/druid/issues/11929#issuecomment-973813297
> With all this in mind it seems like the most conventional language would
be PARTITION BY for segment granularity and CLUSTER BY for secondary
partitioning. Meaning the query would look like:
>
> ```sql
> INSERT INTO tbl
> SELECT ...
> FROM ...
> PARTITION BY FLOOR(__time TO DAY)
> CLUSTER BY channel
> ```
>
> I think there is some risk here of confusion with "PARTITION BY" vs.
Druid's "partitionsSpec" ingestion config, which also uses the word "partition"
but refers more to the "clustering" concept. But I could believe this is fine
for the sake of having the SQL language be more aligned with other DBs.
>
> I'm ok with going with this language. What do people think?
I think these two key words have very clear semantics here. It's pretty much
intuitive for users to understand their meanings.
And they naturally reflect Druid's segment granularity and partitionSpec
concept.
ClickHouse's PARTITION BY supports both of these two semantics (as below),
but in most user cases that I know, users prefer to partition data by date
which is similar to Druid's segment granularity.
```sql
CREATE TABLE table_with_where
(
d DateTime,
a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
CREATE TABLE table_for_recompression
(
d DateTime,
key UInt64,
value String
) ENGINE MergeTree()
PARTITION BY key
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]