[jira] [Updated] (CALCITE-5871) Data distributions need to be combined and represented.

grandfisher (Jira) Tue, 25 Jul 2023 07:06:32 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


grandfisher updated CALCITE-5871:
---------------------------------
    Description: 
For a distributed partition database, the data may be partitioned by time, and 
also hash partitioned by the `region` field.
If there is agg that  aggregate on "(Day,Region)", It's hard to show AGG rel 
distribution.（range(Day) hash(region))

And for another hash shuffle join case  `( L join R  on L.a=R.c and L.b =R.d  ) 
as T` , now  T has satisfy two distributions, one is Hash(a,b) and  another is 
Hash(c,d),  it's not Hash(a,b,c,d). But we must lost one of them because the 
Reldistribution can  only  has one distribution.

We think this is common in time-series distributed databases

  was:
For a distributed partition database, the data may be partitioned by time, and 
also hash partitioned by the `region` field.
If there is agg that  aggregate on "(Day,Region)", It's hard to show AGG rel 
distribution.



And for another hash shuffle join case  `( L join R  on L.a=R.c and L.b =R.d  ) 
as T` , now  T has satisfy two distributions, one is Hash(a,b) and  another is 
Hash(c,d),  it's not Hash(a,b,c,d). But we must lost one of them because the 
Reldistribution can  only  has one distribution.

We think this is common in time-series distributed databases


> Data distributions need to be combined and represented.
> -------------------------------------------------------
>
>                 Key: CALCITE-5871
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5871
>             Project: Calcite
>          Issue Type: Improvement
>          Components: server
>            Reporter: grandfisher
>            Priority: Major
>
> For a distributed partition database, the data may be partitioned by time, 
> and also hash partitioned by the `region` field.
> If there is agg that  aggregate on "(Day,Region)", It's hard to show AGG rel 
> distribution.（range(Day) hash(region))
> And for another hash shuffle join case  `( L join R  on L.a=R.c and L.b =R.d  
> ) as T` , now  T has satisfy two distributions, one is Hash(a,b) and  another 
> is Hash(c,d),  it's not Hash(a,b,c,d). But we must lost one of them because 
> the Reldistribution can  only  has one distribution.
> We think this is common in time-series distributed databases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5871) Data distributions need to be combined and represented.

Reply via email to