[jira] Commented: (DERBY-3279) Derby 10.3.X sorts VARCHAR column incorrectly when table is indexed and VARCHAR column contains a value of '00000'

A B (JIRA) Sun, 16 Dec 2007 16:54:03 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552306
 ]


A B commented on DERBY-3279:
----------------------------

In the absence of IN list "multi-probing", the way Derby implements an IN list 
is:

  while (more rows in the underlying result set)
  {
      get next row from the result set
      search IN list values to see if target column of row exists in the IN list
      if target column of the row exists in the IN list, return the row
  }

Since the loop is based on scanning the underlying result set for rows, any 
rows which we return will be returned in the order they are received from the 
underlying result set.  So if, for example, the underlying result is an index 
scan, and if the index is defined as DESCENDING, the rows will come back in 
descending order.

With multi-probing enabled, which only happens after DERBY-47 and only happens 
for index scans (which is why the ordering is correct if the index is removed, 
per Ajay's description), the processing is (loosely):

  while (more values in the IN list)
  {
      get next IN list value
      probe underlying index for a row whose target column matches the current 
IN list value
      if index probe returned a row, return that row
  }

Notice how, here, the loop is driven by the order of the values in the *IN 
list*, instead of by the order of the rows as they are returned from the index 
scan.  So even if the index is defined as "DESC", the final ordering of the 
rows will match the ordering of the IN values.

That said, in order to ensure proper handling of duplicate IN list values, the 
current multi-probe logic in Derby always sorts IN list values in ASCENDING 
order.  So unless an explicit SortResultSet is generated for execution time, 
the rows for an IN-list multi-probe will always be in ASCENDING order.

The final relevant piece of information is that the optimizer is smart enough 
to remove unnecessary sorts when a) there is an ORDER BY on some index column 
IC, b) the final access plan uses that index, and c) the ASC/DESC option of the 
ORDER BY matches the ASC/DESC option of the index column IC.  For the example 
queries, we have "ORDER BY CHEESE_CODE DESC", and we have an index definition 
which includes "CHEESE_CODE DESC".  So the optimizer thinks the sort is 
unnecessary and removes it.  Hence this Jira.

There are two fixes that come to mind right away:

  1. Make the optimizer recognize that the sort cannot be removed if we're 
multi-probing, or
  2. Make MultiProbeTableScanResultSet sort the IN list values in ascending OR 
descending order depending on the ORDER BY clause (instead of always sorting in 
ASCENDING).

I think option #2 is the best option here.  I have a fix in progress and will 
post when it is postable...

> Derby 10.3.X sorts VARCHAR column incorrectly when table is indexed and 
> VARCHAR column contains a value of '00000'
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3279
>                 URL: https://issues.apache.org/jira/browse/DERBY-3279
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.3.2.1
>         Environment: Rational Application Developer 7.0.0.2 (Eclipse 3.2.2), 
> J2RE 1.5.0 IBM J9 2.3 Windows XP
>            Reporter: Ajay Bhala
>
> Running the following produces the error seen in Derby 10.3.X but not in 
> 10.2.X nor in 10.1.X.
> Don't know if this related to DERBY-3231.
> First query is incorrectly sorted whereas the second one is okay when there 
> is an index on the table. 
> If the table is not indexed, the sort works correctly in DESC order.
> ------
> create table CHEESE (
>   CHEESE_CODE       VARCHAR(5),
>   CHEESE_NAME       VARCHAR(20),
>   CHEESE_COST       DECIMAL(7,4)
> );
> create index cheese_index on CHEESE (CHEESE_CODE DESC, CHEESE_NAME DESC, 
> CHEESE_COST DESC);
> INSERT INTO CHEESE (
>   CHEESE_CODE,
>   CHEESE_NAME,
>   CHEESE_COST)
> VALUES ('00000', 'GOUDA', 001.1234),
>        ('00000', 'EDAM', 002.1111),
>        ('54321', 'EDAM', 008.5646),
>        ('12345', 'GORGONZOLA', 888.2309),
>        ('AAAAA', 'EDAM', 999.8888),
>        ('54321', 'MUENSTER', 077.9545);
> SELECT * FROM CHEESE 
> WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
> ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
> SELECT * FROM CHEESE 
> WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
> ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3279) Derby 10.3.X sorts VARCHAR column incorrectly when table is indexed and VARCHAR column contains a value of '00000'

Reply via email to