[
https://issues.apache.org/jira/browse/DERBY-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
A B updated DERBY-3288:
-----------------------
Attachment: d3288_v2.patch
Attaching d3288_v2.patch, which is a first attempt a complete solution for this
issue. The patch does the following:
1. Move all of the existing "pull optimizable" processing out
of the getNextPermutation() method and into its own method,
pullOptimizableFromJoinOrder(), for the sake of clarity.
2. Add a call to pullOptimizableFromJoinOrder() that updates
assignedTableMap to correctly reflect the pulling of the
optimizable in question.
3. Make the getNextPermutation() metod call the new method
(pullOptimizableFromJoinOrder()) **BEFORE** trying to
determine what the next legal optimizable for the join
order can be.
Changes 1 thru 3 above fix the query as reported in this Jira.
But with those changes, store/SpaceTable.sql was failing with an ASSERT failure
just as it did for d3288_incomplete_v1.patch. I looked into this and
eventually tracked it down to the fact that the FromVTI class misuses the
"referencedTableMap" field that it inherits from ResultSetNode.java. More
specifically, that field is intended to hold table numbers for all FromTables
which appear at or _beneath_ a given FromTable in the query tree. But for
FromVTI, the referencedTableMap was holding table numbers for all FromTables
which were referenced by the _arguments_ to the FromVTI. As an example, take
the following query (which is pulled from store/SpaceTable.sql):
select conglomeratename, isindex,
numallocatedpages, numfreepages, pagesize, estimspacesaving
from SYS.SYSSCHEMAS s, SYS.SYSTABLES t,
new org.apache.derby.diag.SpaceTable(SCHEMANAME,TABLENAME) v
where s.SCHEMAID = t.SCHEMAID
and s.SCHEMANAME = 'APP'
order by conglomeratename;
In this query SpaceTable will be parsed as a FromVTI and will get its own table
number; SYSSCHEMAS and SYSTABLES will each get its own table number, as well.
That said, the referencedTableMap for the FromVTI should only contain the table
number of the FromVTI itself--because there are no other FromTables beneath the
FromVTI in the query tree (esp. FromVTI doesn't have any children nodes that
are instances of FromTable). But with the current codeline, the arguments for
SpaceTable will be processed and their table numbers will be added to FromVTI's
referencedTableMap, so that FromVTI's referenced table map will contain:
{ SYSSCHEMAS, SYSTABLES, SpaceTable }
instead of
{ SpaceTable }
This confuses the optimizer's dependency checking, which assumes that a
FromTable's referenced map only contains table numbers for itself and any
FromTable children. The result of this confusion is a join order that puts
SpaceTable _before_ SYSSCHEMAS and SYSTABLES, which means that attempts to
generate the FromVTI wll fail because the tables that it references have not
yet been generated.
Note that FromVTI _does_ have a dependency on SYSSCHEMAS and SYSTABLES as a
result of its arguments--that part is correct. And the current code _does_ try
to enforce that dependency. But the misuse of referencedTableMap causes the
current code to do the wrong thing. This is because the assignedTableMap used
by the Optimizer sets/unsets bits based on the referenced table map. So we end
up in a situation like this:
Join order: { SYSTABLES, SYSSCHEMAS, SpaceTable }
assignedTableMap: { SYSTABLES, SYSSCHEMAS, SpaceTable }
-- Pull SpaceTable from the join order. We do an XOR between
assignedTableMap and SpaceTable's referenced table map. Since
SpaceTable's referenced table (incorrectly) includes bits
for SYSSCHEMAS and SYSTABLES, the XOR yields:
Join order: { SYSTABLES, SYSSCHEMAS, -- }
assignedTableMap: { } // this is WRONG!
Note how the assignedTableMap is now empty. This is wrong--
after the pull there are still two tables in the join order,
so assignedTablemap should reflect those tables. But due
to the FromVTI's incorrect referenced map, assignedTableMap
no longer matches the join order.
-- Pull SYSSCHEMAS from the join order. We do an XOR between
assignedTableMap and SYSSCHEMA's referenced table map,
yielding:
Join order: { SYSTABLES, --, -- }
assignedTableMap: { SYSSCHEMAS }
Note how assignedTableMap is STILL wrong here. Since we
(incorrectly) cleared assignedTableMap above, the XOR with
SYSSCHEMAS actually _added_ SYSSCHEMAS to the assigned
table map, which is the opposite of what were supposed
to do. Since SYSSCHEMAS is no longer in the join order,
it should not show up in assignedTableMap.
-- Attempt to place SpaceTable in the join order. Since it
depends on { SYSSCHEMAS, SYSTABLES } and assignedTableMap
does not include both of those, we can't place it. So
we'll pull again...
-- Pull SYSTABLES from the join order. We do an XOR between
assignedTableMap and SYSTABLES's referenced table map,
yielding:
Join order: { --, --, -- }
assignedTableMap: { SYSSCHEMAS, SYSTABLES }
So our assignedTableMap is completely wrong at this point.
Since we (incorrectly) cleared assignedTableMap when we
pulled SpaceTable, our assigned table map now indicates that
SYSSCHEMAS and SYSTABLES are both already in the join order,
when in truth the join order is EMPTY at this point.
-- Attempt to place SpaceTable in the join order. Since it
depends on { SYSSCHEMAS, SYSTABLES } and since assignedTableMap
incorrectly includes both of those, the optimizer will place
SpaceTable into the first position in the join order. We
end up with:
Join order: { SpaceTable, --, -- }
assignedTableMap: { SYSSCHEMAS, SYSTABLES, SpaceTable }
This becomes the "cheapest" join order that the optimizer finds, so this is
what Derby will try to generate. But since SpaceTable depends on SYSSCHEMAS
and SYSTABLES, which will only be generated _after_ SpaceTable (because
FromTables are generated in the order in which they appear in the join order),
the generation code for SpaceTable will fail.
All of that to say that FromVTI should not include table numbers for its
arguments in its referencedTableMap. So with that in mind, the d3288_v2.patch
also does the following:
4. Fix FromVTI to correctly set up its referencedTableMap and
dependency map, without mixing the two together.
5. Remove some assignedTableMap "clear" (XOR) logic that appears
to only have been necessary due to the bug in FromVTI--at least
as far as I can tell. Removing this logic makes it so that
there is now a _single_ place in the code where we "add to" the
assigned table map (namely, when placing an optimizable), and
there is a _single_ place where we "remove from" the assigned
table map (namely, when pulling an optimizable).
6. Add the repro for this issue to the existing lang/subqueryFlattening.sql
tests, since that's where EXISTS flattening is currently tested.
I ran derbyall and suites.All with an earlier version of this patch and
everything ran cleanly. Since then I have added a few ASSERT statements to
sanity check the contents of assignedTableMap at certain points during the join
order processing. I have not yet run the regression suites with these ASSERTs
in place, but plan to do so shortly.
In the meantime, if anyone has any comments on the patch as posted, I'd be glad
to hear them.
Note to reviewers: The patch itself is 929 lines, but the vast majority of that
comes from the relocation of "pull optimizable" processing into its own method.
> wrong query result in presence of a unique index
> ------------------------------------------------
>
> Key: DERBY-3288
> URL: https://issues.apache.org/jira/browse/DERBY-3288
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.4.0.0
> Reporter: Gerald Khin
> Assignee: A B
> Attachments: d3288_incomplete_v1.patch, d3288_v2.patch, DERBY-3288.htm
>
>
> The DDL to reproduce the bug is:
> CREATE TABLE tab_a (PId BIGINT NOT NULL);
> CREATE TABLE tab_c (Id BIGINT NOT NULL PRIMARY KEY, PAId BIGINT NOT NULL,
> PBId BIGINT NOT NULL);
> INSERT INTO tab_c VALUES (91, 81, 82);
> INSERT INTO tab_c VALUES (92, 81, 84);
> INSERT INTO tab_c VALUES (93, 81, 88);
> INSERT INTO tab_c VALUES (96, 81, 83);
> CREATE TABLE tab_v (OId BIGINT NOT NULL , UGId BIGINT NOT NULL, val CHAR(1)
> NOT NULL);
> CREATE UNIQUE INDEX tab_v_i1 ON tab_v (OId, UGId, val);
> CREATE INDEX tab_v_i2 ON tab_v (UGId, val, OId);
> INSERT INTO tab_v VALUES (81, 31, 'A');
> INSERT INTO tab_v VALUES (82, 31, 'A');
> INSERT INTO tab_v VALUES (83, 31, 'A');
> INSERT INTO tab_v VALUES (84, 31, 'A');
> INSERT INTO tab_v VALUES (85, 31, 'A');
> INSERT INTO tab_v VALUES (86, 31, 'A');
> INSERT INTO tab_v VALUES (87, 31, 'A');
> INSERT INTO tab_v VALUES (81, 32, 'A');
> INSERT INTO tab_v VALUES (82, 32, 'A');
> INSERT INTO tab_v VALUES (83, 32, 'A');
> INSERT INTO tab_v VALUES (84, 32, 'A');
> INSERT INTO tab_v VALUES (85, 32, 'A');
> INSERT INTO tab_v VALUES (86, 32, 'A');
> INSERT INTO tab_v VALUES (87, 32, 'A');
> CREATE TABLE tab_b (Id BIGINT NOT NULL PRIMARY KEY, OId BIGINT NOT NULL);
> INSERT INTO tab_b VALUES (141, 81);
> INSERT INTO tab_b VALUES (142, 82);
> INSERT INTO tab_b VALUES (143, 84);
> INSERT INTO tab_b VALUES (144, 88);
> INSERT INTO tab_b VALUES (151, 81);
> INSERT INTO tab_b VALUES (152, 83);
> CREATE TABLE tab_d (Id BIGINT NOT NULL PRIMARY KEY, PAId BIGINT NOT NULL,
> PBId BIGINT NOT NULL);
> INSERT INTO tab_d VALUES (181, 141, 142);
> INSERT INTO tab_d VALUES (182, 141, 143);
> INSERT INTO tab_d VALUES (186, 151, 152);
> The query returning the wrong result is:
> SELECT tab_b.Id
> FROM tab_b JOIN tab_c ON (tab_b.OId = tab_c.PAId OR tab_b.OId = tab_c.PBId)
> LEFT OUTER JOIN tab_a ON tab_b.OId = PId
> WHERE EXISTS (SELECT 'X' FROM tab_d WHERE (PAId = 141 AND PBId = tab_b.Id) OR
> (PBId = 141 AND PAId = tab_b.Id))
> AND EXISTS (SELECT 'X' FROM tab_v WHERE OId = tab_b.OId AND UGId = 31 AND
> val = 'A')
> The result should consist of two rows (142),(143), but it returns only one
> row (142).
> The correct result would be returned if the index tab_v_i1 had been created
> as non-unique.
> The correct result would also be returned if the condition ...AND val='A' had
> been replaced by ...AND val='A' || ''.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.