[
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705646#action_12705646
]
Mamta A. Satoor commented on DERBY-3926:
----------------------------------------
I went through the optimize phase through the debugger and it appears to me (I
may be wrong and would appreciate others looking at my detail analysis of the
optimize phase below) that the problem may be with the generate phase or the
execute phase where we may be not using the non-unique index on table2
correctly to fetch the orders in row.
The query in question is as below
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES
joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId'
AND table1.id=table3.id
AND table3.name='PostComponentId'
AND table3.value='21857' ORDER BY table2.value;
For the query above, in addition to the predicates supplied by the user,
optimizer internally generates another predicate, namely, table3.id=table2.id
So for the queyr, all the predicates are as follows
1)table1.id=table2.id
2)table1.id=table3.id
3)table3.id=table2.id
4)table2.name='PageSequenceId'
5)table3.name='PostComponentId'
6)table3.value='21857'
Of the predicates above, 4), 5) and 6) can be pushed down to the corresponding
optimizables ie 4) will be associated with table2 and 5),6) will be associated
with table3. This is because these predicates are constant comparison with
columns. This leaves us with 3 predicates, namely 1), 2), 3)
which are multitable join predicates.
Optimizer has a class called RowOrdering associated with it
(OptimizerImpl.currentRowOrdering).
currentRowOrdering has following fields in it.
currentRowOrdering RowOrderingImpl
alwaysOrderedOptimizables Vector<E>
columnsAlwaysOrdered ColumnOrdering
currentColumnOrdering null
ordering Vector<E>
unorderedOptimizables Vector<E>
All the predicates that are constant comparison will go into
columnsAlwaysOrdered. These pushing of constant comparison predicates happen
per optimizable basis when that particular optimizable is being consdiered in
the possible join order combination.
In our specific query, through optimizer overrides, we have instructed
optimizer to only consider join order [table3, table2, table1]. The optimizer
starts with [table3, -1, -1]. First thing it does is it goes through the join
predicates (which are 1), 2) and 3) in the predicate list above). But since
all the referenced tables for any of the 3 predicates are not covered by the
current join order of [table3, -1, -1], nothing gets done to those join
predicates. Next, the optimizer will tell
currentRowOrdering to (this happens in
FromBaseTable(FromTable).tellRowOrderingAboutConstantColumns(RowOrdering,
OptimizablePredicateList) line: 1477) to add predicates 5) and 6) from above
list into it's columnsAlwaysOrdered list. So, at the end of the
[table3, -1, -1], currentRowOrdering.columnsAlwaysOrdered will look as follows
Direction: 3 Table 0, Column 3 Table 0, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has
column 3(value) and column 2(name) which are always ordered because they are
being compared with constants. So far, the logic for currentRowOrdering seems
to be working fine. Next, we have asked the optimizer to use the index
index=nonUniqueOnValue_Table3 on Table3. This index covers the predicate 6)
since that predicate is on the same column on which the index is created but it
does not cover the other columns from table3 that are being referenced in this
query (which table3.id and table3.name). Because of this, we determine that the
index being considered is not a covering index. The code to determine whether
the sorting can be avoided for [table3, -1, -1], is in
OrderByList.sortRequired(RowOrdering, JBitSet) method. Since order by is on
table2.value, the order by column's table does not match with table3 and hecne
we determine that sorting is not required based on what optimizer has seen so
far. So it appears that we leave it to table2 when its turn comes in the join
order to decide whether sorting should indeed be avoided or not.
Next we consider the join order [table3, table2, -1]. For table2, we have asked
the optimizer to use index=nonUniqueOnValue_Table2. First thing that we do is
go through the join predicates 1), 2) and 3). Predicate number 3) which is
TABLE3.ID = TABLE2.ID can be pushed down to optimizable table2 because the
current join order [table3, table2, -1] includes the tables referenced by
predicate 3). So, at this point, there are 2 predicates pushed down to table3,
they are number 5) and 6). And for table2, there are 2 prdicates pushed down to
it, they are number 3) and 4). Also, since predicate 4) is a constant
comparison, it will get added to currentRowOrdering. At this point,
currentRowOrdering.columnsAlwaysOrdered will look as folows
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has
column 3(value) and column 2(name) which are always ordered because they are
being compared with constants. In addition, Table at position 1(which is Table2
in our join order) has column 2 which is always ordered because it is being
compared with constant. Next, we have asked the optimizer to use the index
nonUniqueOnValue_Table2 but it does not cover the constant comparison predicate
4) since that predicate is on column name and not value. Notice, this is a
different code path we are following for table2 compared to table3 above.
Because table3.value is not already an ordered column in currentRowOrdering
because there is no
constant comparison predicate on it, we add it to the "ordering " vector in
currentOrdering object. This is the first object that gets added to the
currentRowOrdering."ordering" vector in our eg. So, at this point, the
currentRowOrdering has only 3 of it's fields propulated and they are as follows
columnsAlwaysOrdered ColumnOrdering
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering ColumnOrdering
Direction: 1 Table 1, Column 3
ordering Vector<E>
[Direction: 1 Table 1, Column 3]
The index nonUniqueOnValue_Table2 does not cover any predicate on Table2 and it
does not cover all the column from table2 that are being referenced in this
query and hence it is not a covering index. Next, the code to determine whether
sort can be avoided for join order [table3, table2, -1], we go through the code
path in OrderByList.sortRequired(RowOrdering, JBitSet) method. We find that the
order by column's table matches with table2 in join order. Because of this
match, we need to look at currentRowOrdering to see if it will take care of the
sorting and if so we can avoid the sort. To look into currentRowOrdering, we
first call currentRowOrdering.alwaysOrdered(cr.getTableNumber()) (in this call,
cr is the order by column). So, we are checking if table2 is always ordered in
currentRowOrdering. Since table2 is not always ordered in this query, this
check returns false. Next, we check if not the entire table, is the order by
table.order by column combination always ordered in currentRowOrdering. In our
query, that will be table2.value Since there is no constant comparison
predicate on table2.value, it is not going to be in columnsAlwaysOrdered vector
in currentRowOrdering.
For reference, currentRowOrdering looks as follws
columnsAlwaysOrdered ColumnOrdering
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering ColumnOrdering
Direction: 1 Table 1, Column 3
ordering Vector<E>
[Direction: 1 Table 1, Column 3]
As we can see from currentRowOrdering object above, columnsAlwaysOrdered does
not include Table 1, Column 3. So, we have not found table2 to be always
ordered and we have not found table2.value to be always ordered either. The
last place to check is the ordering vector in columnsAlwaysOrdered. This vector
does include Table 1, Column 3 which is table2.value and hence we determine
that sorting is not needed to table2. All this code of checking the
columnsAlwaysOrdered happens in OrderByList.sortRequired(RowOrdering, JBitSet).
Assuming that this code is working as intended, I think then the culprit might
be when we generate the code. The only step in optimize left is to add table1
to the join order. So, at the end of the optimize phase, the join order will
look as follows [table2, table2, table1] and currentRowOrdering looks as follows
columnsAlwaysOrdered ColumnOrdering
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering ColumnOrdering
Direction: 1 Table 2, Column 1
ordering Vector<E>
[Direction: 1 Table 1, Column 3, Direction: 1 Table 2, Column 1]
The only change to currentRowOrdering that is caused by adding of table1 in
third join order position is that we are going to use primary key on table1 and
hence we need to reflect that in currentRowOrdering by adding it to the
ordering vector.
> Incorrect ORDER BY caused by index
> ----------------------------------
>
> Key: DERBY-3926
> URL: https://issues.apache.org/jira/browse/DERBY-3926
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
> Reporter: Tars Joris
> Assignee: Mamta A. Satoor
> Attachments: derby-reproduce.zip, script3.sql,
> script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large
> column: VARCHAR(1024). I know it is generally not a good idea to have an
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column
> but the result is not sorted. It is sorted if I remove the index on that
> column.
> The output of the attached script is as follows (results should be ordered on
> the middle column):
> ID |VALUE |VALUE
> ----------------------------------------------
> 2147483653 |000002 |21857
> 2147483654 |000003 |21857
> 4294967297 |000001 |21857
> While I would expect:
> ID |VALUE |VALUE
> ----------------------------------------------
> 4294967297 |000001 |21857
> 2147483653 |000002 |21857
> 2147483654 |000003 |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the
> ij-tool.
> Note that the result of the query becomes correct when enough data is
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is
> ignoring the order by clause."
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.