[
https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
A B updated DERBY-3603:
-----------------------
Attachment: d3603_v1.patch
The fact that "oneRowRightSide" is true for the NestedLoop join does indeed
appear to be important here. Good find, Thomas! With that in mind I did some
tracing of the execution path. My findings are below.
The query I used for tracing is as follows:
select account.admin_unit_id, account.account_id,
booking.booking_id, cast (admin_unit.admin_unit_name as varchar(30))
from spike.accounts account, spike.bookings booking,
spike.admin_units admin_unit
where account.account_id = booking.account_id
and admin_unit.admin_unit_id = account.admin_unit_id
and admin_unit.admin_unit_id in (1, 21)
and booking.child_id = 2;
When I run this query the execution plan shows a join order of:
(ACCOUNTS x BOOKINGS) x ADMIN_UNITS
That is, we join ACCOUNTS and BOOKINGS to get a row R, then for each row R from
that join, we join it (R) with a row from ADMIN_UNITS. Due to the presence of
the IN list the optimizer chooses to do multi-probing on ADMIN_UNITS, which
means the probe predicate that represents "admin_unit.admin_unit_id IN (1, 21)"
becomes the start and stop key. Since we only allow a single start/stop key
for any column, the join predicate between ACCOUNTS and ADMIN_UNITS is
relegated to being a store qualifier. This is notable for two reasons: 1) this
explains the additional qualifier seen by Bryan in the query plan, and 2) as
seen below, this qualifier is what causes the "oneRowRightSide" variable to get
set to TRUE for ADMIN_UNITS.
Also note: before the change for DERBY-3061 the optimizer always sorted the
join predicate as "stronger" than the probe predicate, and thus always chose
the join predicate as the start/stop key. That in turn meant that the probe
predicate, which can only be used for probing if it is the start/stop key, was
"reverted" back to a normal IN list qualifier. So the problem described below
did not show up. But with DERBY-3061 the probe predicate and the join
predicate have the same "strength" and thus the optimizer is free to use either
one as the start/stop key. In this example it chooses the probe predicate.
All of that said, let's take a step back and note that the join between
ACCOUNTS and BOOKINGS returns three rows:
select account.admin_unit_id, account.account_id, booking.booking_id
from spike.accounts account, spike.bookings booking
where account.account_id = booking.account_id
and booking.child_id = 2;
ADMIN_UNIT&|ACCOUNT_ID |BOOKING_ID
-----------------------------------
1 |10 |1
1 |10 |3
1 |10 |4
So now, for *each* of the above three rows we want to fetch the matching row
from ADMIN_UNITS. So far so good.
The "oneRowRightSide" variable that Thomas mentioned comes into play because
the optimizer thinks that for every outer row, i.e. for each of the three rows
shown above, we will have at MOST one matching row from ADMIN_UNITS. At first
I thought this was incorrect--but upon further inspection it turns out be true.
The optimizer deduced this because the predicate refers to a primary key
column, which has to be unique. So when we try to fetch a row such that the
ADMIN_UNIT_ID column of ADMIN_UNITS equals some value, we will get at most one
row back. And since we have the join predicate as a store qualifier (mentioned
above), we know that we _will_ in fact be trying "to get a row such that the
ADMIN_UNIT_ID column of ADMIN_UNITS equals some value", where "some value" will
be the ADMIN_UNIT_ID column from the intermediate join table shown above.
(Note: the intermediate join table doesn't actually exist, it's just shown as
such for discussion).
So at this point things are still working as expected. We'll take the first
row from the intermediate join, i.e. "(1, 10, 1)", and we'll do two things.
First we'll get the "next" probe value in the list of values. Since the probe
list is "(1, 21)", the next value is "1". Second, we will see if there is a
row in ADMIN_UNITS such that:
a) the ADMIN_UNIT_ID column of the row equals the current probe
value, which is "1", AND
b) the ADMIN_UNIT_ID column of the row equals the ADMIN_UNIT_ID of
the outer row, which is "(1, 10, 1)".
Since both of these conditions are true, we'll return the first row as
expected. So our result at this point is:
ADMIN_UNIT&|ACCOUNT_ID |BOOKING_ID
-----------------------------------
1 |10 |1
Now we attempt to fetch the "next" row from ADMIN_UNITS that matches the outer
row "(1, 10, 1)". But as mentioned earlier, the NestedLoop result set that
would do so has "oneRowRightSide" set to TRUE. This means that the NestedLoop
_knows_ it will not find any more rows in ADMIN_UNITS that satsify conditions
"a" and "b" shown above. So instead of going to the table, >>> it just QUITS
the scan <<<. See the following in NestedLoopJoinResultSet:
if (oneRowRightSide && returnedRowMatchingRightSide)
{
rightRow = null;
returnedRowMatchingRightSide = false;
}
So we quit the inner scan of ADMIN_UNITS, which leads us to get the next
"outer" row from the intermediate join rows shown earlier. I.e. we'll now set
our outer row to be "(1, 10, 3)". Then we'll execute:
if (leftRow == null)
{
closeRight();
}
else
{
rowsSeenLeft++;
openRight();
}
"leftRow" here is the same as "outer" row, so it is "(1, 10, 3)" at this point.
Since it's not null we'll re-open the scan on ADMIN_UNITS, per the "else"
branch. Then, similar to what we did with the first outer row, we'll get the
"next" probe value in the list of values for ADMIN_UNITS.
But this is where the problem occurs. When we "quit" the scan in the above IF
statement we left the MultiProbe result set in the middle of a scan. So when
we make the call to re-open the scan, we need to RESET the probing state. But
the code in MultiProbeTableScanResultSet does not currently do that. More on
that below.
For the current codeline we do *not* reset the probe scan, and thus the next
probe value we see is "21", from the list (1, 21). With that we will then try
to get a row from ADMIN_UNIT_ID such that:
a) the ADMIN_UNIT_ID column of the row equals the current probe
value, which is >>> "21" <<<, AND
b) the ADMIN_UNIT_ID column of the row equals the ADMIN_UNIT_ID of
the outer row, which is "(1, 10, 3)".
But ADMIN_UNITS does not contain any rows which satisfy condition "a", so we
will not return a row. Having no row from ADMIN_UNITS, we go on to fetch the
next "outer" row from the intermediate join table, meaning we end up with "(1,
10, 4)". So we'll use that outer row and do what we did with the others: i.e.
First we'll get the next probe value in the list of values. The logic in
MultiProbeTableScan result set sees that we have exhaused the probe list at
this point, so *NOW* it RESETS the probing state. After that it gets the
"next" probe value, which is the first value in the list (again), and thus we
end up with a probe value of "1". Second, we will see if there is a row in
ADMIN_UNITS such that:
a) the ADMIN_UNIT_ID column of the row equals the current probe
value, which is "1", AND
b) the ADMIN_UNIT_ID column of the row equals the ADMIN_UNIT_ID of
the outer row, which is "(1, 10, 4)".
Since both of these conditions are true, we'll return the row as expected. So
our result at this point is:
ADMIN_UNIT&|ACCOUNT_ID |BOOKING_ID
-----------------------------------
1 |10 |1
1 |10 |4
And that's how the query ends--missing a row.
So what's going on with MultiProbeTableScanResultSet? There are some comments
in the "reopenCore()" method of that class which explain a) why we do NOT reset
the probe state in some cases, and b) why we DO reset the probe state in
others. For reference the comment is:
/* There are two scenarios for which we reopen this kind of scan:
*
* A - The first is for join processing. In this case we have
* a(nother) row from some outer table and we want to reopen this
* scan to look for rows matching the new outer row.
*
* B - The second is for multi-probing. Here we want to reopen
* the scan on this table to look for rows matching the next value
* in the probe list.
*
* If we are reopening the scan for scenario A (join processing)
* then we need to reset our position within the probe list.
* If we are reopening the scan for scenario B then we do *not*
* want to reset our position within the probe list because that
* position tells us where to find the next probe value.
*
* The way we tell the difference between the two scenarios is
* by looking at our current position in the probe list (i.e. the
* value of probeValIndex): if our current position is beyond the
* length of the probe list then we know that we are reopening the
* scan for scenario A. Or put another away, we should never get
* here for scenario B if probeValIndex is greater than or equal
* to the length of the probe list. The reason is that the call
* to reopenCore() for scenario B will only ever happen when
* moreInListVals() returns true--and in that case we know that
* probeValIndex will be less than the length of the probeValues.
*/
This comment states one thing and assumes another. First it states (correctly)
that if our current position is beyond the length of the probe list then we
must be seeing scenario A. It then goes on to assume (incorrectly) that if the
current position is LESS than the length of the probe list we must be seeing
scenario B. It turns out that this is *NOT* true. The reason it is not true
is because of the "oneRowRightSide" processing described above: when that flag
is set to true we quit the scan early, meaning that the current probe position
will in fact be less than the length of the probe list. Then we later come
back with a(nother) row from the outer table (i.e. the intermediate join table
shown above) and "we want to reopen this scan to look for rows matching the new
outer row". That means we need to reset the probe state: but due to the
incorrect assumption, we don't reset it. The result is incorrect results as
described above.
I'm attaching a first attempt at a fix, d3603_v1.patch. With this patch
applied the repro case correctly returns three rows. The patch does not
include a test case and I don't know yet if it passes the regression tests.
I'm running them now but they won't complete for several more hours. So far
the lang/subqueryFlattening test has failed with a plan diff, which may require
some investigation...
I'm still quite limited on time so if it is agreed that this is the correct fix
(or at least, the right direction), it'd be great if someone could add the test
and take it to completion--esp. if it could happen before the next 10.4 release
candidate. If no one picks it up I'll try to get to it sometime next week, but
probably not in time for the second RC...
Regardless of whether or not this writeup proves correct, many thanks again to
Bryan and Thomas for their investigation and comments. As no one officially
assigned the issue to him/herself I spent some time investigating, as written
above. I hope I didn't step on anyone's toes in the process.
> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
> Key: DERBY-3603
> URL: https://issues.apache.org/jira/browse/DERBY-3603
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.3.2.1, 10.4.1.1
> Reporter: David Butterworth
> Attachments: d3603_v1.patch, derbydb.jar, derbydb.tar.bz2
>
>
> Derbys' 'IN' clause is returning different results depending on which side of
> a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of
> items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/[EMAIL
> PROTECTED]
> Using the test database attatched the following 2 queries produce the issue:
> ij> select count(*) FROM spike.accounts account, spike.admin_units
> admin_unit,
> spike.bookings booking
> WHERE booking.child_id = 2 AND
> admin_unit.admin_unit_id IN (1,21) AND
> booking.booking_date_time_out >= 20080331000000 AND
> booking.booking_date_time_in <= 20080406235900 AND
> account.account_id = booking.account_id AND
> admin_unit.admin_unit_id = account.admin_unit_id;
> 1
> -----------
> 2
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
> spike.bookings booking
> WHERE booking.child_id = 2 AND
> account.admin_unit_id IN (1,21) AND
> booking.booking_date_time_out >= 20080331000000 AND
> booking.booking_date_time_in <= 20080406235900 AND
> account.account_id = booking.account_id AND
> admin_unit.admin_unit_id = account.admin_unit_id;
> 1
> -----------
> 3
> 1 row selected
> ij>
> The only difference between the 2 statements is which side of a join the 'IN'
> clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
> admin_unit.admin_unit_id IN (1)
> or
> account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
> qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
> qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
> Number of rows qualified=2
> Number of rows visited=3
> but in the second case we see:
> Number of rows qualified=3
> Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.