Re: depedent join implementation

Julian Hyde Thu, 06 Jul 2017 10:56:49 -0700

I think you’re talk about this: 
https://issues.apache.org/jira/browse/CALCITE-468 
<https://issues.apache.org/jira/browse/CALCITE-468>

The essence of the trick is to rewrite “A join B” to “A join (B semi-join A’)” 
where A’ is a safe sub-set of A, perhaps “select distinct id from A”, and is 
much smaller than A. It’s OK for A’ to have a few false positives (i.e. keys 
that do not occur in A) and therefore Bloom filters are a good option.

Julian

> On Jul 5, 2017, at 11:22 PM, 魏阔 <[email protected]> wrote:
> 
> Hi all:    Depedent join performance is a huge challenge in processing 
> multi-source joins. Instead of reading all of source A and all of source B, 
> and joining them on A.x = B.x, we want to read all of A then build a set of 
> A.x that are passed as a criteria when querying B. In cases where A is small 
> and B is large, this can drastically reduce the data retrived from B,thus 
> greatly speeding the overall query.     I saw the similar idea implemented in 
> Apache Teiid, is there a similar rule to do so in Calcite? We want to 
> implement this, but still havn't thought clearly, any suggestions ?
> thanks!shanyao

Re: depedent join implementation

Reply via email to