[
https://issues.apache.org/jira/browse/DERBY-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588661#action_12588661
]
A B commented on DERBY-3603:
----------------------------
Thank you very much for picking this one up, Bryan. I greatly appreciate it.
On the subqueryFlattening.diff:
To see why the diff occurs, I stepped through the following exercise. I did
not actually trace through the code, but I walked through what I *think* should
be happening and it seems to have explained the diff...
At the time of the statements in question the data in the DOCS and COLLS tables
is as follows:
DOCS: ID (VARCHAR) =>
('124'), ('24'), ('25'), ('27'), ('36'), ('567')
COLLS: ID (VARCHAR), COLLID (SMALLINT) =>
('123', 2), ('124', -5), ('24', 1), ('26', -2), ('36', 1), ('37', 8)
Now assume the query in question is:
SELECT count(ID) FROM DOCS
WHERE ID <> ANY (SELECT ID FROM COLLS WHERE COLLID IN (-2,1))
The ANY will be flattened giving us something to the effect of:
SELECT count(ID) FROM DOCS, COLLS
WHERE DOCS.ID <> COLLS.ID AND COLLS.COLLID IN (-2, 1)
The IN list for the queries that we're talking about is (-2, 1). So per the
explanation I posted on Friday, we'd end up with something like the following
withOUT the change.
Note: Lines preceded by "+" indicate that we read a row from the inner table,
COLLS. Lines preced by "*' indicate that what we do for the line changes with
the proposed patch for this issue. Lines preceded by an "x" indicate that we
"filter" a row from COLLS, meaning we read it but then we discard it.
- Get DOCS row ('124').
- Get first IN list value, -2.
+ Get row from COLLS where COLLID is -2, i.e. ('26', -2).
- See if ID column of ('26', -2) is NOT EQUAL to '124'. They
are not equal so count the '124' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('24').
* Get next IN list value, 1.
+ Get row from COLLS where COLLID is 1, i.e. ('24', 1).
x See if ID column of ('24', 1) is NOT EQUAL to '24'. They
are EQUAL so we skip the row ('24', 1) and continuing our
scan of COLLS.
+ Get next row from COLLS where COLLID is 1, i.e. ('36', 1).
- See if ID column of ('36', 1) is NOT EQUAL to '24'. They
are not equal so count the '24' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('25').
- Implicitly RESET the probe state, then get the "next" IN
list value, which will be -2.
+ Get row from COLLS where COLLID is -2, i.e. ('26', -2).
- See if ID column of ('26', -2) is NOT EQUAL to '25'. They
are not equal so count the '25' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('27').
* Get next IN list value, 1.
+ Get row from COLLS where COLLID is 1, i.e. ('24', 1).
- See if ID column of ('24', 1) is NOT EQUAL to '27'. They
are not equal so count the '27' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('36').
- Implicitly RESET the probe state, then get the "next" IN
list value, which will be -2.
+ Get row from COLLS where COLLID is -2, i.e. ('26', -2).
- See if ID column of ('26', -2) is NOT EQUAL to '36'. They
are not equal so count the '36' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('567').
* Get next IN list value, 1.
+ Get row from COLLS where COLLID is 1, i.e. ('24', 1).
- See if ID column of ('24', 1) is NOT EQUAL to '567'. They
are not equal so count the '567' row from DOCS and quit the
scan of COLLS.
- No more rows from DOCS, so done.
We can see from this that we get a row from COLLS a total of 7 times. In the
process we "filtered" one row out, i.e. the row ('24', 1), because that row has
the same ID has a row in DOCS. Or put in terms of the scan statistics, we have
7 "rows seen" and one (1) row filtered. And the query returns a count total of
6 rows from DOCS.
But now if we do the same exercise with the patch applied, we see the following:
- Get DOCS row ('124').
- Get first IN list value, -2.
+ Get row from COLLS where COLLID is -2, i.e. ('26', -2).
- See if ID column of ('26', -2) is NOT EQUAL to '124'. They
are not equal so count the '124' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('24').
* RESET probe state, then get "next" IN list value, which
is -2 (again).
+ Get row from COLLS where COLLID is -2, i.e. ('26', 1).
- See if ID column of ('26', -2) is NOT EQUAL to '24'. They
are not equal so count the '24' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('25').
* RESET probe state, then get "next" IN list value, which
is -2 (again).
+ Get row from COLLS where COLLID is -2, i.e. ('26', 1).
- See if ID column of ('26', -2) is NOT EQUAL to '25'. They
are not equal so count the '25' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('27').
* RESET probe state, then get "next" IN list value, which
is -2 (again).
+ Get row from COLLS where COLLID is -2, i.e. ('26', 1).
- See if ID column of ('26', -2) is NOT EQUAL to '27'. They
are not equal so count the '27' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('36').
* RESET probe state, then get "next" IN list value, which
is -2 (again).
+ Get row from COLLS where COLLID is -2, i.e. ('26', 1).
- See if ID column of ('26', -2) is NOT EQUAL to '36'. They
are not equal so count the '36' row from DOCS and quit the
scan of COLLS.
- Get DOCS row ('567').
* RESET probe state, then get "next" IN list value, which
is -2 (again).
+ Get row from COLLS where COLLID is -2, i.e. ('26', 1).
- See if ID column of ('26', -2) is NOT EQUAL to '567'. They
are not equal so count the '567' row from DOCS and quit the
scan of COLLS.
- No more rows from DOCS, so done.
Now the total number of rows we "see" from COLLS is 6 instead of 7. This is
because we never fetched the row ('24', 1) from the table. Instead, we only
had to fetch the one row, ('26', -2), over and over. Then, since we never
fetched row ('24', 1) from the table we didn't have to filter it out. So we
end up with a total of 6 "rows seen" and 0 rows filtered. Meanwhile the query
still returns a count total of 6 rows from DOCS.
I didn't run through this exercise for the other cases, but I imagine (hope!)
things would work out in a similar fashion...Does that seem like a reasonable
explanation to you? If so, then I think it's okay to update the master file
for subqueryFlattening.sql for this issue.
Oh, and thanks for writing the standalone repro, as well!
> 'IN' clause ignores valid results, incorrect qualifier handling suspected
> -------------------------------------------------------------------------
>
> Key: DERBY-3603
> URL: https://issues.apache.org/jira/browse/DERBY-3603
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.3.2.1, 10.4.1.1
> Reporter: David Butterworth
> Assignee: Bryan Pendleton
> Attachments: d3603_v1.patch, d3603_with_test.diff, derbydb.jar,
> derbydb.tar.bz2, subqueryFlattening.diff
>
>
> Derbys' 'IN' clause is returning different results depending on which side of
> a joined table
> I am doing my 'IN' comparison against. This only occurs when the number of
> items within the 'IN' clause is greater then 1.
> This behaviour was also confirmed by Bryan Pendleton in this thread:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/[EMAIL
> PROTECTED]
> Using the test database attatched the following 2 queries produce the issue:
> ij> select count(*) FROM spike.accounts account, spike.admin_units
> admin_unit,
> spike.bookings booking
> WHERE booking.child_id = 2 AND
> admin_unit.admin_unit_id IN (1,21) AND
> booking.booking_date_time_out >= 20080331000000 AND
> booking.booking_date_time_in <= 20080406235900 AND
> account.account_id = booking.account_id AND
> admin_unit.admin_unit_id = account.admin_unit_id;
> 1
> -----------
> 2
> 1 row selected
> ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
> spike.bookings booking
> WHERE booking.child_id = 2 AND
> account.admin_unit_id IN (1,21) AND
> booking.booking_date_time_out >= 20080331000000 AND
> booking.booking_date_time_in <= 20080406235900 AND
> account.account_id = booking.account_id AND
> admin_unit.admin_unit_id = account.admin_unit_id;
> 1
> -----------
> 3
> 1 row selected
> ij>
> The only difference between the 2 statements is which side of a join the 'IN'
> clause is matched against.
> Bryan performed some initial testing and stated the following:
> --------------------- SNIP ------------------------
> Interestingly, although the actual results do NOT contain any values
> for admin_unit_id = 21, if I change the query to:
> admin_unit.admin_unit_id IN (1)
> or
> account.admin_unit_id IN (1)
> then the problem disappears -- I get 3 rows for both queries.
> I also ran query plans for both the queries (in the IN (1,21) case)
> and have pasted the (simplified) query plans at the end of this message.
> I notice that in the case where the query gives 2 rows, which is
> when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
> qualifiers:
> Column[0][0] Id: 0
> Operator: =
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> However, in the case where the query gives 3 rows, which is
> when we specify account.admin_unit_id in (1,21), the admin_unit_id
> index scan output in the query plan contains:
> qualifiers:
> None
> I think it is the presence/absence of this qualifier on the query
> scan which is causing the different results in the query, as in
> the first case we see:
> Number of rows qualified=2
> Number of rows visited=3
> but in the second case we see:
> Number of rows qualified=3
> Number of rows visited=3
> I definitely don't have any explanation for why you are getting
> this odd behavior; it certainly seems like a bug to me.
> -------------END SNIP -----------------------
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.