[
https://issues.apache.org/jira/browse/CALCITE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
grandfisher updated CALCITE-5871:
---------------------------------
Description:
For a distributed partition database, the data may be partitioned by time, and
also hash partitioned by the `region` field.
If there is agg that aggregate on "(Day,Region)", It's hard to show AGG rel
distribution.(range(Day) hash(region))
And for another hash shuffle join case `( L join R on L.a=R.c and L.b =R.d )
as T` , now T has satisfy two distributions, one is Hash(a,b) and another is
Hash(c,d), it's not Hash(a,b,c,d). But we must lost one of them because the
Reldistribution can only has one distribution.
We think this is common in time-series distributed databases
was:
For a distributed partition database, the data may be partitioned by time, and
also hash partitioned by the `region` field.
If there is agg that aggregate on "(Day,Region)", It's hard to show AGG rel
distribution.
And for another hash shuffle join case `( L join R on L.a=R.c and L.b =R.d )
as T` , now T has satisfy two distributions, one is Hash(a,b) and another is
Hash(c,d), it's not Hash(a,b,c,d). But we must lost one of them because the
Reldistribution can only has one distribution.
We think this is common in time-series distributed databases
> Data distributions need to be combined and represented.
> -------------------------------------------------------
>
> Key: CALCITE-5871
> URL: https://issues.apache.org/jira/browse/CALCITE-5871
> Project: Calcite
> Issue Type: Improvement
> Components: server
> Reporter: grandfisher
> Priority: Major
>
> For a distributed partition database, the data may be partitioned by time,
> and also hash partitioned by the `region` field.
> If there is agg that aggregate on "(Day,Region)", It's hard to show AGG rel
> distribution.(range(Day) hash(region))
> And for another hash shuffle join case `( L join R on L.a=R.c and L.b =R.d
> ) as T` , now T has satisfy two distributions, one is Hash(a,b) and another
> is Hash(c,d), it's not Hash(a,b,c,d). But we must lost one of them because
> the Reldistribution can only has one distribution.
> We think this is common in time-series distributed databases
--
This message was sent by Atlassian Jira
(v8.20.10#820010)