Hello, I would like to resurrect this thread in the context of calcite streams. Unfortunately bloom-filters is not an option for the data-sources being used.
Say one has stream to table join <https://calcite.apache.org/docs/stream.html#joining-streams-to-tables>. >From docs example: SELECT STREAM o.productId, o.orderId, o.units, p.name, p.unitPrice FROM Orders AS o -- streamable Table JOIN Products AS p -- reference data table ON o.productId = p.productId; 1. Am I correct to assume that each event in Orders table (which is a stream) will trigger full table scan (without filter) on Products table ? 2. Can I register my custom rule to rewrite the query when, say, Orders and Products tables are present to manually add a sub query ? 3. Do I have to disable SubQueryRemoveRule in this case ? 4. Vadym, not sure how sub-query computation will work. Can I partially execute the query and convert the subquery into EnumerableValues ? Is there a way to solve this problem non-generically ? We’re also hitting this limitation in Flink (which uses calcite but not calcite streams) for similar use-case. Many Thanks, Andrei. On Thu, Aug 30, 2018 at 5:27 PM Vineet Garg <[email protected]> wrote: > Hive actually does this optimization (it is called semi-join reduction) by > generating bloom-filters on one side and passing it on to the other side. > This is not a rewrite but instead a physical implementation. > > Vineet > > On Aug 29, 2018, at 10:34 AM, Vladimir Sitnikov < > [email protected]<mailto:[email protected]>> wrote: > > Nested loops are never likely to happe > > What's wrong with that? > > Apparently Andrei asks for that, and "subquery precomputation" is quite > close to nested loops in my opinion. > > Vladimir > >
