[jira] [Comment Edited] (CALCITE-5871) Data distributions need to be combined and represented.

grandfisher (Jira) Tue, 25 Jul 2023 23:08:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747290#comment-17747290
 ]


grandfisher edited comment on CALCITE-5871 at 7/26/23 6:07 AM:
---------------------------------------------------------------

OK, I have found *RelCompositeTrait* can solve data has satisfied more than one 
disrtibution trait.

Howerver, It still confuse us in time-series distributed databases. For 
example, in some database such as doris and es , the data has Partition and 
Distribution. Suppose there are two days of data, 2023-01-01 and 2023-01-02
Every day's data has five buckets, and every day's data will enter the 
corresponding bucket according to a certain hash key.
If such a table is queried, how should the data be considered distributed?
Data satisfy {*}RANGE_DISTRIBUTION{*}. and each *Partition* data satisfy 
{*}HASH_DISTRIBUTION{*}. But we don't think this can be expressed with 
{*}RelCompositeTrait{*}.


was (Author: JIRAUSER298606):
OK, I have found *RelCompositeTrait* can solve data has  satisfy more than one 
disrtibution trait.

Howerver, It still confuse us in time-series distributed databases. For 
example, in some database such as doris and es , the data has Partition and 
Distribution. Suppose there are two days of data, 2023-01-01 and 2023-01-02
Every day's data has five buckets, and every day's data will enter the 
corresponding bucket according to a certain hash key.
If such a table is queried, how should the data be considered distributed?
Data satisfy {*}RANGE_DISTRIBUTION{*}. and each *Partition* data satisfy 
{*}HASH_DISTRIBUTION{*}. But we don't think this can be expressed with 
{*}RelCompositeTrait{*}.

> Data distributions need to be combined and represented.
> -------------------------------------------------------
>
>                 Key: CALCITE-5871
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5871
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: grandfisher
>            Priority: Major
>
> For a distributed partition database, the data may be partitioned by time, 
> and also hash partitioned by the `region` field.
> If there is agg that  aggregate on "(Day,Region)", It's hard to show AGG rel 
> distribution.（range(Day) hash(region))
> And for another hash shuffle join case  `( L join R  on L.a=R.c and L.b =R.d  
> ) as T` , now  T has satisfy two distributions, one is Hash(a,b) and  another 
> is Hash(c,d),  it's not Hash(a,b,c,d). But we must lost one of them because 
> the Reldistribution can  only  has one distribution.
> We think this is common in time-series distributed databases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-5871) Data distributions need to be combined and represented.

Reply via email to