[
https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747290#comment-17747290
]
grandfisher edited comment on CALCITE-5871 at 7/26/23 6:07 AM:
---------------------------------------------------------------
OK, I have found *RelCompositeTrait* can solve data has satisfied more than one
disrtibution trait.
Howerver, It still confuse us in time-series distributed databases. For
example, in some database such as doris and es , the data has Partition and
Distribution. Suppose there are two days of data, 2023-01-01 and 2023-01-02
Every day's data has five buckets, and every day's data will enter the
corresponding bucket according to a certain hash key.
If such a table is queried, how should the data be considered distributed?
Data satisfy {*}RANGE_DISTRIBUTION{*}. and each *Partition* data satisfy
{*}HASH_DISTRIBUTION{*}. But we don't think this can be expressed with
{*}RelCompositeTrait{*}.
was (Author: JIRAUSER298606):
OK, I have found *RelCompositeTrait* can solve data has satisfy more than one
disrtibution trait.
Howerver, It still confuse us in time-series distributed databases. For
example, in some database such as doris and es , the data has Partition and
Distribution. Suppose there are two days of data, 2023-01-01 and 2023-01-02
Every day's data has five buckets, and every day's data will enter the
corresponding bucket according to a certain hash key.
If such a table is queried, how should the data be considered distributed?
Data satisfy {*}RANGE_DISTRIBUTION{*}. and each *Partition* data satisfy
{*}HASH_DISTRIBUTION{*}. But we don't think this can be expressed with
{*}RelCompositeTrait{*}.
> Data distributions need to be combined and represented.
> -------------------------------------------------------
>
> Key: CALCITE-5871
> URL: https://issues.apache.org/jira/browse/CALCITE-5871
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: grandfisher
> Priority: Major
>
> For a distributed partition database, the data may be partitioned by time,
> and also hash partitioned by the `region` field.
> If there is agg that aggregate on "(Day,Region)", It's hard to show AGG rel
> distribution.(range(Day) hash(region))
> And for another hash shuffle join case `( L join R on L.a=R.c and L.b =R.d
> ) as T` , now T has satisfy two distributions, one is Hash(a,b) and another
> is Hash(c,d), it's not Hash(a,b,c,d). But we must lost one of them because
> the Reldistribution can only has one distribution.
> We think this is common in time-series distributed databases
--
This message was sent by Atlassian Jira
(v8.20.10#820010)