Hi Vladimir,

Thanks for follow-up and explanation. I wanted to make sure I'm not missing
(mis-understanding) anything.

Andrei.



On Wed, Aug 29, 2018 at 11:01 AM Vladimir Sitnikov <
[email protected]> wrote:

> One of the approaches to such queries is to throw Bloom filters all over
> the place.
>
> That is it could execute "small side" of the join, collect the ids (or a
> lossy version of it in a form of Bloom filters),
> and it could propagate that Bloom filter to the second source to reduce the
> set of rows produced by the second row source.
> Then the join would be easier to do since the second row source is reduced.
>
> The sad thing is not all systems support propagation of bloom filters.
>
> >select *from
> >  t1 join t2 on (t1.id = t2.id)where
> >  t2.id in (select id from t1) -- force sub selec
>
> What if Calcite did just a regular batched nested loop join?
> That is:
> 1. Fetch next 10 rows from t1
> 2. Fetch "from t2 where id in (...)"
> 3. goto 1
>
> It can be expressed via correlated subqueries, however:
> a) I'm not sure correlated subqueries work great at the moment
> b) Support for "batched" correlated execution is likely not there
> c) Calcite should somehow know the true cost of "from t2 where id in (1,2)"
> vs "from t2 where id in (1,2,3,4)". In other words, current costing model
> does not take into account if the table has index or not. One can code such
> costing rules, however I think it is not there yet.
>
> Vladimir
>

Reply via email to