When you implement an equi-join as a nested loops join, the right-hand side
always has a filter combining the variable set by the left-hand side and the
join column on the right-hand side.
You will need to do a full scan of the right-hand table every time, unless your
table is organized so that you can access a subset that matches the filter.
In your example
select emp.name
from emp join dept on emp.deptid = dept.id
where dept.name = ‘Sales'
The simple plan would be
Correlate (v := deptno)
Filter name = ’Sales'
Scan dept
Filter deptno = $v
Scan emp
If “emp” was physically ordered by deptno, the plan could be optimized to
Correlate (v := deptno)
Filter name = ’Sales'
Scan dept
RangeScan emp deptno = $v
> On Aug 7, 2019, at 11:41 PM, Gabriel Reid <[email protected]> wrote:
>
> Hi Stamatis,
>
> Thank you so much, this is exactly the kind of info I was looking for! I
> think I can figure out how to go forward based on this, thanks again.
>
> - Gabriel
>
> On Thu, Aug 8, 2019 at 8:19 AM Stamatis Zampetakis <[email protected]>
> wrote:
>
>> Hi Gabriel,
>>
>> What you want indeed seems to be a nested loop join; there has been a
>> relevant discussion in the dev list [1].
>> You may also find relevant the ongoing discussion on CALCITE-2979 [2].
>>
>> Best,
>> Stamatis
>>
>> [1]
>>
>> https://lists.apache.org/thread.html/d9f95683e66009872a53e7e617295158b98746b550d2bf68230b3096@%3Cdev.calcite.apache.org%3E
>>
>> [2] https://issues.apache.org/jira/browse/CALCITE-2979
>>
>> On Wed, Aug 7, 2019 at 2:56 PM Gabriel Reid <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently working on a custom Calcite adapter, and I've got a
>> situation
>>> where I want to join a small table with a large table, basically just
>> using
>>> the small table to filter the large table. Conceptually, I'm talking
>> about
>>> something that is equivalent to the following query:
>>>
>>> select emp.name
>>> from emp join dept on emp.deptid = dept.id
>>> where dept.name = 'Sales'
>>>
>>> I've got converter rules which to push down filtering on all tables,
>> which
>>> make a big difference in performance. The above query current results in
>> an
>>> EnumerableHashJoin over a filtered scan over the 'dept' table, and an
>>> unfiltered scan over the 'emp' table.
>>>
>>> What I would like to accomplish is that this is converted into a (I
>> think)
>>> a nested loop join between 'dept' and emp, so that the filtered scan is
>>> done once over 'dept', and then a filtered scan is done for each entry of
>>> the 'dept' table using the 'id' value from that entry as a filter value
>> on
>>> the scan on the 'emp' table.
>>>
>>> I've been able to get a nested loop join to be used, but I haven't
>> managed
>>> to have the 'id' values from the 'dept' table to be used to filter the
>>> 'emp' table. Instead, the full 'emp' table is scanned for each iteration
>> of
>>> the 'dept' table.
>>>
>>> And now my questions: are my hopes/expectations here realistic, and/or is
>>> there a much better/easier way of accomplishing the same thing? Is there
>>> prior art somewhere within the Calcite code base or elsewhere? Or should
>>> this just be working by default?
>>>
>>> I would assume that this isn't that unusual of a situation, which is why
>> I
>>> was expecting that there would already be something like this somewhere
>> (or
>>> I'm doing the wrong thing), but I haven't managed to find any clear
>>> pointers in any one direction yet.
>>>
>>> Thanks in advance for any advice!
>>>
>>> - Gabriel
>>>
>>