[
https://issues.apache.org/jira/browse/DERBY-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552306
]
A B commented on DERBY-3279:
----------------------------
In the absence of IN list "multi-probing", the way Derby implements an IN list
is:
while (more rows in the underlying result set)
{
get next row from the result set
search IN list values to see if target column of row exists in the IN list
if target column of the row exists in the IN list, return the row
}
Since the loop is based on scanning the underlying result set for rows, any
rows which we return will be returned in the order they are received from the
underlying result set. So if, for example, the underlying result is an index
scan, and if the index is defined as DESCENDING, the rows will come back in
descending order.
With multi-probing enabled, which only happens after DERBY-47 and only happens
for index scans (which is why the ordering is correct if the index is removed,
per Ajay's description), the processing is (loosely):
while (more values in the IN list)
{
get next IN list value
probe underlying index for a row whose target column matches the current
IN list value
if index probe returned a row, return that row
}
Notice how, here, the loop is driven by the order of the values in the *IN
list*, instead of by the order of the rows as they are returned from the index
scan. So even if the index is defined as "DESC", the final ordering of the
rows will match the ordering of the IN values.
That said, in order to ensure proper handling of duplicate IN list values, the
current multi-probe logic in Derby always sorts IN list values in ASCENDING
order. So unless an explicit SortResultSet is generated for execution time,
the rows for an IN-list multi-probe will always be in ASCENDING order.
The final relevant piece of information is that the optimizer is smart enough
to remove unnecessary sorts when a) there is an ORDER BY on some index column
IC, b) the final access plan uses that index, and c) the ASC/DESC option of the
ORDER BY matches the ASC/DESC option of the index column IC. For the example
queries, we have "ORDER BY CHEESE_CODE DESC", and we have an index definition
which includes "CHEESE_CODE DESC". So the optimizer thinks the sort is
unnecessary and removes it. Hence this Jira.
There are two fixes that come to mind right away:
1. Make the optimizer recognize that the sort cannot be removed if we're
multi-probing, or
2. Make MultiProbeTableScanResultSet sort the IN list values in ascending OR
descending order depending on the ORDER BY clause (instead of always sorting in
ASCENDING).
I think option #2 is the best option here. I have a fix in progress and will
post when it is postable...
> Derby 10.3.X sorts VARCHAR column incorrectly when table is indexed and
> VARCHAR column contains a value of '00000'
> ------------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-3279
> URL: https://issues.apache.org/jira/browse/DERBY-3279
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.3.1.4, 10.3.2.1
> Environment: Rational Application Developer 7.0.0.2 (Eclipse 3.2.2),
> J2RE 1.5.0 IBM J9 2.3 Windows XP
> Reporter: Ajay Bhala
>
> Running the following produces the error seen in Derby 10.3.X but not in
> 10.2.X nor in 10.1.X.
> Don't know if this related to DERBY-3231.
> First query is incorrectly sorted whereas the second one is okay when there
> is an index on the table.
> If the table is not indexed, the sort works correctly in DESC order.
> ------
> create table CHEESE (
> CHEESE_CODE VARCHAR(5),
> CHEESE_NAME VARCHAR(20),
> CHEESE_COST DECIMAL(7,4)
> );
> create index cheese_index on CHEESE (CHEESE_CODE DESC, CHEESE_NAME DESC,
> CHEESE_COST DESC);
> INSERT INTO CHEESE (
> CHEESE_CODE,
> CHEESE_NAME,
> CHEESE_COST)
> VALUES ('00000', 'GOUDA', 001.1234),
> ('00000', 'EDAM', 002.1111),
> ('54321', 'EDAM', 008.5646),
> ('12345', 'GORGONZOLA', 888.2309),
> ('AAAAA', 'EDAM', 999.8888),
> ('54321', 'MUENSTER', 077.9545);
> SELECT * FROM CHEESE
> WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
> ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
> SELECT * FROM CHEESE
> WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
> ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.