[ 
https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748939#comment-17748939
 ] 

LakeShen commented on CALCITE-5871:
-----------------------------------

Hi [~grandfisher] ,I get you problem,because the Partition is a table level 
attributes,bucket HASH_DISTRIBUTION \{*}i{*}s partition level attributes.

Recently I was reading the paper《Incorporating Partitioning and Parallel Plans 
into the SCOPE optimizer》,and you  might be able to get some information from it

> Data distributions need to be combined and represented.
> -------------------------------------------------------
>
>                 Key: CALCITE-5871
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5871
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: grandfisher
>            Priority: Major
>
> For a distributed partition database, the data may be partitioned by time, 
> and also hash partitioned by the `region` field.
> If there is agg that  aggregate on "(Day,Region)", It's hard to show AGG rel 
> distribution.(range(Day) hash(region))
> And for another hash shuffle join case  `( L join R  on L.a=R.c and L.b =R.d  
> ) as T` , now  T has satisfy two distributions, one is Hash(a,b) and  another 
> is Hash(c,d),  it's not Hash(a,b,c,d). But we must lost one of them because 
> the Reldistribution can  only  has one distribution.
> We think this is common in time-series distributed databases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to