[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Mamta A. Satoor (JIRA) Mon, 04 May 2009 09:56:56 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705646#action_12705646
 ]


Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I went through the optimize phase through the debugger and it appears to me (I 
may be wrong and would appreciate others looking at my detail analysis of the 
optimize phase below) that the problem may be with the generate phase or the 
execute phase where we may be not using the non-unique index on table2 
correctly to fetch the orders in row.

The query in question is as below
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES 
joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;

For the query above, in addition to the predicates supplied by the user, 
optimizer internally generates another predicate, namely, table3.id=table2.id
So for the queyr, all the predicates are as follows
1)table1.id=table2.id 
2)table1.id=table3.id 
3)table3.id=table2.id
4)table2.name='PageSequenceId' 
5)table3.name='PostComponentId' 
6)table3.value='21857' 

Of the predicates above, 4), 5) and 6) can be pushed down to the corresponding 
optimizables ie 4) will be associated with table2 and 5),6) will be associated 
with table3. This is because these predicates are constant comparison with 
columns. This leaves us with 3 predicates, namely 1), 2), 3)
which are multitable join predicates.

Optimizer has a class called RowOrdering associated with it 
(OptimizerImpl.currentRowOrdering). 
currentRowOrdering has following fields in it.
currentRowOrdering      RowOrderingImpl  
        alwaysOrderedOptimizables       Vector<E>
        columnsAlwaysOrdered    ColumnOrdering
        currentColumnOrdering   null    
        ordering        Vector<E>  
        unorderedOptimizables   Vector<E>  
All the predicates that are constant comparison will go into 
columnsAlwaysOrdered. These pushing of constant comparison predicates happen 
per optimizable basis when that particular optimizable is being consdiered in 
the possible join order combination.

In our specific query, through optimizer overrides, we have instructed 
optimizer to only consider join order [table3, table2, table1]. The optimizer 
starts with [table3, -1, -1]. First thing it does is it goes through the join 
predicates (which are 1), 2) and 3) in the predicate list above). But since
all the referenced tables for any of the 3 predicates are not covered by the 
current join order of [table3, -1, -1], nothing gets done to those join 
predicates. Next, the optimizer will tell 
currentRowOrdering to (this happens in 
FromBaseTable(FromTable).tellRowOrderingAboutConstantColumns(RowOrdering, 
OptimizablePredicateList) line: 1477) to add predicates 5) and 6) from above 
list into it's columnsAlwaysOrdered list. So, at the end of the
[table3, -1, -1], currentRowOrdering.columnsAlwaysOrdered will look as follows 
Direction: 3 Table 0, Column 3 Table 0, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has 
column 3(value) and column 2(name) which are always ordered because they are 
being compared with constants. So far, the logic for currentRowOrdering seems 
to be working fine. Next, we have asked the optimizer to use the index 
index=nonUniqueOnValue_Table3 on Table3. This index covers the predicate 6) 
since that predicate is on the same column on which the index is created but it 
does not cover the other columns from table3 that are being referenced in this 
query (which table3.id and table3.name). Because of this, we determine that the 
index being considered is not a covering index. The code to determine whether 
the sorting can be avoided for [table3, -1, -1], is in 
OrderByList.sortRequired(RowOrdering, JBitSet) method. Since order by is on 
table2.value, the order by column's table does not match with table3 and hecne 
we determine that sorting is not required based on what optimizer has seen so 
far. So it appears that we leave it to table2 when its turn comes in the join 
order to decide whether sorting should indeed be avoided or not.

Next we consider the join order [table3, table2, -1]. For table2, we have asked 
the optimizer to use index=nonUniqueOnValue_Table2. First thing that we do is 
go through the join predicates 1), 2) and 3). Predicate number 3) which is 
TABLE3.ID = TABLE2.ID can be pushed down to optimizable table2 because the 
current join order [table3, table2, -1] includes the tables referenced by 
predicate 3). So, at this point, there are 2 predicates pushed down to table3, 
they are number 5) and 6). And for table2, there are 2 prdicates pushed down to 
it, they are number 3) and 4). Also, since predicate 4) is a constant 
comparison, it will get added to currentRowOrdering. At this point, 
currentRowOrdering.columnsAlwaysOrdered will look as folows
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has 
column 3(value) and column 2(name) which are always ordered because they are 
being compared with constants. In addition, Table at position 1(which is Table2 
in our join order) has column 2 which is always ordered because it is being 
compared with constant. Next, we have asked the optimizer to use the index 
nonUniqueOnValue_Table2 but it does not cover the constant comparison predicate 
4) since that predicate is on column name and not value. Notice, this is a 
different code path we are following for table2 compared to table3 above. 
Because table3.value is not already an ordered column in currentRowOrdering 
because there is no
constant comparison predicate on it, we add it to the "ordering " vector in 
currentOrdering object. This is the first object that gets added to the 
currentRowOrdering."ordering" vector in our eg. So, at this point, the 
currentRowOrdering has only 3 of it's fields propulated and they are as follows
columnsAlwaysOrdered    ColumnOrdering
        Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering   ColumnOrdering 
        Direction: 1 Table 1, Column 3
ordering        Vector<E>  
        [Direction: 1 Table 1, Column 3]
The index nonUniqueOnValue_Table2 does not cover any predicate on Table2 and it 
does not cover all the column from table2 that are being referenced in this 
query and hence it is not a covering index. Next, the code to determine whether 
sort can be avoided for join order [table3, table2, -1], we go through the code 
path in OrderByList.sortRequired(RowOrdering, JBitSet) method. We find that the 
order by column's table matches with table2 in join order. Because of this 
match, we need to look at currentRowOrdering to see if it will take care of the 
sorting and if so we can avoid the sort. To look into currentRowOrdering, we 
first call currentRowOrdering.alwaysOrdered(cr.getTableNumber()) (in this call, 
cr is the order by column). So, we are checking if table2 is always ordered in 
currentRowOrdering. Since table2 is not always ordered in this query, this 
check returns false. Next, we check if not the entire table, is the order by 
table.order by column combination always ordered in currentRowOrdering. In our 
query, that will be table2.value Since there is no constant comparison 
predicate on table2.value, it is not going to be in columnsAlwaysOrdered vector 
in currentRowOrdering.
For reference, currentRowOrdering looks as follws
columnsAlwaysOrdered    ColumnOrdering
        Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering   ColumnOrdering 
        Direction: 1 Table 1, Column 3
ordering        Vector<E>  
        [Direction: 1 Table 1, Column 3]
As we can see from currentRowOrdering object above, columnsAlwaysOrdered does 
not include Table 1, Column 3. So, we have not found table2 to be always 
ordered and we have not found table2.value to be always ordered either. The 
last place to check is the ordering vector in columnsAlwaysOrdered. This vector 
does include Table 1, Column 3 which is table2.value and hence we determine 
that sorting is not needed to table2. All this code of checking the 
columnsAlwaysOrdered happens in OrderByList.sortRequired(RowOrdering, JBitSet). 
Assuming that this code is working as intended, I think then the culprit might 
be when we generate the code. The only step in optimize left is to add table1 
to the join order. So, at the end of the optimize phase, the join order will 
look as follows [table2, table2, table1] and currentRowOrdering looks as follows
columnsAlwaysOrdered    ColumnOrdering
        Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering   ColumnOrdering 
        Direction: 1 Table 2, Column 1
ordering        Vector<E>  
        [Direction: 1 Table 1, Column 3, Direction: 1 Table 2, Column 1]

The only change to currentRowOrdering that is caused by adding of table1 in 
third join order position is that we are going to use primary key on table1 and 
hence we need to reflect that in currentRowOrdering by adding it to the 
ordering vector.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, 
> script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large 
> column: VARCHAR(1024). I know it  is generally not a good idea to have an 
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column 
> but the result is not sorted. It is sorted if I remove the index on that 
> column.
> The output of the attached script is as follows (results should be ordered on 
> the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value 
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the 
> ij-tool.
> Note that the result of the query becomes correct when enough data is 
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and 
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is 
> ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Reply via email to