[ 
http://issues.apache.org/jira/browse/DERBY-47?page=comments#action_12432636 ] 
            
James Synge commented on DERBY-47:
----------------------------------

I spent much of last week tracking down a performance problem in an application 
that I'm working on, and it turned out to be described here.

I developed a test application (that I will attach) that explores different 
ways of doing essentially this query:

        SELECT * FROM myTable WHERE foreignKeyColumn IN (?, ..., ?)

We tried the following strategies:

Literals        - 1 query, using WHERE column IN ('literal1', ..., 'literalN')
Literal         - N queries, using WHERE column = 'literal[i]
Markers         - 1 query, using WHERE column IN (?, ..., ?)
Marker          - N queries, using WHERE column = ?
TempTable       - 1 query, store parameters in a temp table, use nested select, 
then delete parameters
ScratchPad      - 1 query, store parameters in a table, use nested select, then 
delete parameters
ScrSavepoint    - 1 query, set savepoint, store parameters in a table, use 
nested select, then rollback savepoint

We were astonished to find that converting the query to:

        SELECT * FROM myTable WHERE foreignKeyColumn = ?

And repeating that query N times was by far the best performer out of 7 
different strategies I tried.  (This is what I call the Marker strategy above.)

Here are the results for a table with 100,000 rows in it, and then after that 
for 1,000,000 rows:
(Note: this table is tab delimited, for easy importing into Excel.  I'll also 
attach this data.)

        Literals                Literal         Markers         Marker          
TempTable               ScratchPad              ScrSavepoint            
ID Count        Total ms        Avg ms  Total ms        Avg ms  Total ms        
Avg ms  Total ms        Avg ms  Total ms        Avg ms  Total ms        Avg ms  
Total ms        Avg ms
1       20      20      10      10      10      10      0       0       1232    
1232    1132    1132    1022    1022
2       881     440     20      10      450     225     0       0       1022    
511     1041    520     1042    521
3       1051    350     30      10      2794    931     0       0       1022    
340     1022    340     1012    337
4       1012    253     40      10      2013    503     0       0       1032    
258     1002    250     1202    300
5       1132    226     40      8       2053    410     0       0       1032    
206     1022    204     1042    208
6       1042    173     50      8       1523    253     0       0       1012    
168     1022    170     1051    175
7       1132    161     60      8       3145    449     0       0       1022    
146     1032    147     1112    158
8       1102    137     60      7       3034    379     10      1       1062    
132     1202    150     1082    135
9       1102    122     60      6       2965    329     0       0       1142    
126     1151    127     992     110
10      1112    111     70      7       3526    352     0       0       1022    
102     1052    105     1062    106
20      1142    57      120     6       3746    187     10      0       1022    
51      1112    55      1232    61
30      1317    43      195     6       4117    137     10      0       1022    
34      1082    36      1072    35
40      1252    31      250     6       4417    110     20      0       1022    
25      1091    27      1282    32
50      1292    25      320     6       4777    95      20      0       1062    
21      1052    21      1052    21
60      1327    22      350     5       5068    84      20      0       1062    
17      1082    18      1112    18
70      1332    19      415     5       5504    78      30      0       1042    
14      1142    16      1081    15
80      1327    16      471     5       5769    72      40      0       1041    
13      1052    13      1277    15
90      1362    15      481     5       6330    70      40      0       1052    
11      1152    12      1092    12
100     1372    13      536     5       6460    64      40      0       1283    
12      1092    10      1202    12


=============================================================================================================================

        Literals                Literal         Markers         Marker          
TempTable               ScratchPad              ScrSavepoint            
ID Count        Total ms        Avg ms  Total ms        Avg ms  Total ms        
Avg ms  Total ms        Avg ms  Total ms        Avg ms  Total ms        Avg ms  
Total ms        Avg ms
1       160     160     70      70      40      40      41      41      69841   
69841   44261   44261   36699   36699
2       44120   22060   181     90      222124  111062  0       0       8624    
4312    6851    3425    6580    3290
3       10958   3652    120     40      12540   4180    10      3       6461    
2153    6431    2143    6520    2173



> Some possible improvements to IN optimization
> ---------------------------------------------
>
>                 Key: DERBY-47
>                 URL: http://issues.apache.org/jira/browse/DERBY-47
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.0.2.0
>         Environment: all
>            Reporter: Sunitha Kambhampati
>         Attachments: QueryPlanUniqueIndexAndWordIndexOneTerm.txt, 
> QueryPlanUniqueIndexAndWordIndexTwoTerms.txt, 
> QueryPlanUniqueIndexOnlyOneTerm.txt, QueryPlanUniqueIndexOnlyTwoTerms.txt
>
>
> Consider a simple case of  - 
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is 
>  select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has 
> only two values and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN 
> list to get the start and stop values for the scan. Thus the range of the 
> values in the "IN" list here plays an important role. 
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an 
> index scan would be chosen.
> It would be nice if we could do something clever in this case where there is 
> clearly an index on the field and the number of values in the IN list is 
> known. Maybe use the rowcount estimate and the IN list size to do some 
> optimizations.  
> - consider the length of the "IN" list to do searches on the table.  ie use 
> the IN list values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a 
> temporary table and do a join. It is most likely that the optimizer will 
> choose the table with "IN" list here as the outer table in the join and thus 
> will do key searches on the larger table. 
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for 
> some similar queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from 
> scanfixed where i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) 
> ******* Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:          750.38
>       optimizer estimated cost:         8579.46
> Source result set:
>       Table Scan ResultSet for SCANFIXED at read committed isolation level 
> using instantaneous share row locking chosen by the optimizer
>       Number of opens = 1
>       Rows seen = 10000
>       Rows filtered = 0
>       Fetch Size = 16
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               next time in milliseconds/row = 0
>       scan information: 
>               Bit set of columns fetched=All
>               Number of columns fetched=9
>               Number of pages visited=417
>               Number of rows qualified=10000
>               Number of rows visited=10000
>               Scan type=heap
>               start position: 
> null          stop position: 
> null          qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
>               optimizer estimated row count:          750.38
>               optimizer estimated cost:         8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID = 
> 0), select * from scanfixed where i1 in 
> (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict 
> ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:            4.80
>       optimizer estimated cost:           39.53
> Source result set:
>       Index Row to Base Row ResultSet for SCANFIXED:
>       Number of opens = 1
>       Rows seen = 10
>       Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               optimizer estimated row count:            4.80
>               optimizer estimated cost:           39.53
>               Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at 
> read committed isolation level using instantaneous share row locking chosen 
> by the optimizer
>               Number of opens = 1
>               Rows seen = 10
>               Rows filtered = 0
>               Fetch Size = 16
>                       constructor time (milliseconds) = 0
>                       open time (milliseconds) = 0
>                       next time (milliseconds) = 0
>                       close time (milliseconds) = 0
>                       next time in milliseconds/row = 0
>               scan information: 
>                       Bit set of columns fetched=All
>                       Number of columns fetched=2
>                       Number of deleted rows visited=0
>                       Number of pages visited=2
>                       Number of rows qualified=10
>                       Number of rows visited=10
>                       Scan type=btree
>                       Tree height=2
>                       start position: 
>       >= on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       stop position: 
>       > on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       qualifiers:
> None
>                       optimizer estimated row count:            4.80
>                       optimizer estimated cost:           39.53

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to