[ 
http://issues.apache.org/jira/browse/DERBY-713?page=comments#action_12357867 ] 

Daniel James Neades commented on DERBY-713:
-------------------------------------------

Creating a clone probably wasn't the right thing to do, but given the extra 
information now added to DERBY-47, it should be of type defect and not merely 
an improvement. My apologies if I've done the wrong thing, but perhaps a JIRA 
admin can sort this out and change DERBY-47's type and summary?

The extra comments on DERBY-47 describe how the query optimizer is making very 
poor choices when dealing with IN clauses with multiple terms (and equivalent 
"WHERE thing=x OR thing=y OR thing=z" expressions). This makes Derby 
performance very poor, even for some simple queries.

In addition, the optimizer seems to use inappropriate indexes in certain 
circumstances (again, described in comments added to DERBY-47), meaning that 
performance can degrade significantly degrade, merely by the presence of an 
additional index on a table. This means adding indexes to a table in an attempt 
to improve one query can unexpectedly degrade the performance of other queries. 
I believe that this should be considered a major defect.


> CLONE - Query optimizer should not make poor choices when optimizing IN and 
> WHERE clauses
> -----------------------------------------------------------------------------------------
>
>          Key: DERBY-713
>          URL: http://issues.apache.org/jira/browse/DERBY-713
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.0.2.0
>  Environment: all
>     Reporter: Daniel James Neades

>
> Consider a simple case of  - 
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is 
>  select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has 
> only two values and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN 
> list to get the start and stop values for the scan. Thus the range of the 
> values in the "IN" list here plays an important role. 
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an 
> index scan would be chosen.
> It would be nice if we could do something clever in this case where there is 
> clearly an index on the field and the number of values in the IN list is 
> known. Maybe use the rowcount estimate and the IN list size to do some 
> optimizations.  
> - consider the length of the "IN" list to do searches on the table.  ie use 
> the IN list values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a 
> temporary table and do a join. It is most likely that the optimizer will 
> choose the table with "IN" list here as the outer table in the join and thus 
> will do key searches on the larger table. 
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for 
> some similar queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from 
> scanfixed where i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) 
> ******* Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:          750.38
>       optimizer estimated cost:         8579.46
> Source result set:
>       Table Scan ResultSet for SCANFIXED at read committed isolation level 
> using instantaneous share row locking chosen by the optimizer
>       Number of opens = 1
>       Rows seen = 10000
>       Rows filtered = 0
>       Fetch Size = 16
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               next time in milliseconds/row = 0
>       scan information: 
>               Bit set of columns fetched=All
>               Number of columns fetched=9
>               Number of pages visited=417
>               Number of rows qualified=10000
>               Number of rows visited=10000
>               Scan type=heap
>               start position: 
> null          stop position: 
> null          qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
>               optimizer estimated row count:          750.38
>               optimizer estimated cost:         8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID = 
> 0), select * from scanfixed where i1 in 
> (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict 
> ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:            4.80
>       optimizer estimated cost:           39.53
> Source result set:
>       Index Row to Base Row ResultSet for SCANFIXED:
>       Number of opens = 1
>       Rows seen = 10
>       Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               optimizer estimated row count:            4.80
>               optimizer estimated cost:           39.53
>               Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at 
> read committed isolation level using instantaneous share row locking chosen 
> by the optimizer
>               Number of opens = 1
>               Rows seen = 10
>               Rows filtered = 0
>               Fetch Size = 16
>                       constructor time (milliseconds) = 0
>                       open time (milliseconds) = 0
>                       next time (milliseconds) = 0
>                       close time (milliseconds) = 0
>                       next time in milliseconds/row = 0
>               scan information: 
>                       Bit set of columns fetched=All
>                       Number of columns fetched=2
>                       Number of deleted rows visited=0
>                       Number of pages visited=2
>                       Number of rows qualified=10
>                       Number of rows visited=10
>                       Scan type=btree
>                       Tree height=2
>                       start position: 
>       >= on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       stop position: 
>       > on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       qualifiers:
> None
>                       optimizer estimated row count:            4.80
>                       optimizer estimated cost:           39.53

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to