[jira] [Commented] (CALCITE-5871) Data distributions need to be combined and represented.

LakeShen (Jira) Sun, 30 Jul 2023 07:56:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748939#comment-17748939
 ]


LakeShen commented on CALCITE-5871:
-----------------------------------

Hi [~grandfisher] ,I get you problem,because the Partition is a table level 
attributes,bucket HASH_DISTRIBUTION \{*}i{*}s partition level attributes.

Recently I was reading the paper《Incorporating Partitioning and Parallel Plans 
into the SCOPE optimizer》,and you  might be able to get some information from it

> Data distributions need to be combined and represented.
> -------------------------------------------------------
>
>                 Key: CALCITE-5871
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5871
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: grandfisher
>            Priority: Major
>
> For a distributed partition database, the data may be partitioned by time, 
> and also hash partitioned by the `region` field.
> If there is agg that  aggregate on "(Day,Region)", It's hard to show AGG rel 
> distribution.（range(Day) hash(region))
> And for another hash shuffle join case  `( L join R  on L.a=R.c and L.b =R.d  
> ) as T` , now  T has satisfy two distributions, one is Hash(a,b) and  another 
> is Hash(c,d),  it's not Hash(a,b,c,d). But we must lost one of them because 
> the Reldistribution can  only  has one distribution.
> We think this is common in time-series distributed databases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5871) Data distributions need to be combined and represented.

Reply via email to