I think you’re talk about this: https://issues.apache.org/jira/browse/CALCITE-468 <https://issues.apache.org/jira/browse/CALCITE-468>
The essence of the trick is to rewrite “A join B” to “A join (B semi-join A’)” where A’ is a safe sub-set of A, perhaps “select distinct id from A”, and is much smaller than A. It’s OK for A’ to have a few false positives (i.e. keys that do not occur in A) and therefore Bloom filters are a good option. Julian > On Jul 5, 2017, at 11:22 PM, 魏阔 <[email protected]> wrote: > > Hi all: Depedent join performance is a huge challenge in processing > multi-source joins. Instead of reading all of source A and all of source B, > and joining them on A.x = B.x, we want to read all of A then build a set of > A.x that are passed as a criteria when querying B. In cases where A is small > and B is large, this can drastically reduce the data retrived from B,thus > greatly speeding the overall query. I saw the similar idea implemented in > Apache Teiid, is there a similar rule to do so in Calcite? We want to > implement this, but still havn't thought clearly, any suggestions ? > thanks!shanyao
