So for the second query: select * from orders where time > '10/01/2002' and time < '11/30/2002' order by order_id;
the query plan shows that the index IX_ORDERS_TIME is used to filter the result set by time. The order by step does not use the primary key index to sort the results after the filter step. My questions: --Is it correct that the sort step not use the primary key index in this case? --Why is it not possible to use the index on order_id to sort after the filter has happened? Here is the query plan: Statement Name: null Statement Text: select * from orders where time > '10/01/2002' and time < '11/30/2002' order by order_id Parse Time: 0 Bind Time: 0 Optimize Time: 0 Generate Time: 0 Compile Time: 0 Execute Time: 14329 Begin Compilation Timestamp : null End Compilation Timestamp : null Begin Execution Timestamp : 2005-09-19 09:20:06.171 End Execution Timestamp : 2005-09-19 09:20:20.5 Statement Execution Plan Text: Sort ResultSet: Number of opens = 1 Rows input = 166333 Rows returned = 1000 Eliminate duplicates = false In sorted order = false Sort information: Number of merge runs=1 Number of rows input=166333 Number of rows output=166333 Size of merge runs=[93695] Sort type=external constructor time (milliseconds) = 0 open time (milliseconds) = 14297 next time (milliseconds) = 32 close time (milliseconds) = 0 optimizer estimated row count: 78377.51 optimizer estimated cost: 166745.12 Source result set: Index Row to Base Row ResultSet for ORDERS: Number of opens = 1 Rows seen = 166333 Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6} constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 10488 close time (milliseconds) = 0 optimizer estimated row count: 78377.51 optimizer estimated cost: 166745.12 Index Scan ResultSet for ORDERS using index IX_ORDERS_TIME at read committed isolation level using instantaneous share row locking chosen by the optimizer Number of opens = 1 Rows seen = 166333 Rows filtered = 0 Fetch Size = 16 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 3438 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched=All Number of columns fetched=2 Number of deleted rows visited=0 Number of pages visited=887 Number of rows qualified=166333 Number of rows visited=166333 Scan type=btree Tree height=3 start position: > on first 1 column(s). Ordered null semantics on the following columns: stop position: >= on first 1 column(s). Ordered null semantics on the following columns: qualifiers: None optimizer estimated row count: 78377.51 optimizer estimated cost: 166745.12 --scott -----Original Message----- From: Sunitha Kambhampati [mailto:[EMAIL PROTECTED] Sent: Friday, September 16, 2005 5:55 PM To: Derby Discussion Subject: Re: derby performance and 'order by' Scott Ogden wrote: > I have observed some interesting query performance behavior and am > hoping someone here can explain. > > In my scenario, it appears that an existing index is not being used > for the 'order by' part of the operation and as a result the > performance of certain queries is suffering. > > Can someone explain if this is supposed to be what is happening and > why? Please see below for the specific queries and their performance > characteristics. > > Here are the particulars: > > --------------------------------- > > create table orders( > > order_id varchar(50) NOT NULL > > CONSTRAINT ORDERS_PK PRIMARY KEY, > > amount numeric(31,2), > > time date, > > inv_num varchar(50), > > line_num varchar(50), > > phone varchar(50), > > prod_num varchar(50)); > > --Load a large amount of data (720,000 records) into the 'orders' table > > --Create an index on the time column as that will be used in the > 'where' clause. > > create index IX_ORDERS_TIME on orders(time); > > --When I run a query against this table returning top 1,000 records, > this query returns very quickly, consistently less than .010 seconds. > > select * from orders > > where time > '10/01/2002' and time < '11/30/2002' > > order by time; > > --Now run a similarly query against same table, returning the top > 1,000 records. > > --The difference is that the results are now sorted by the primary key > ('order_id') rather than 'time'. > > --This query returns slowly, approximately 15 seconds. Why?? > > select * from orders > > where time > '10/01/2002' and time < '11/30/2002' > > order by order_id; > > --Now run a third query against the same 'orders' table, removing the > where clause > > --This query returns quickly, around .010 seconds. > > select * from orders > > order by order_id; > > --------------------------------------------- > If you run with derby.language.logQueryPlan=true, the actual query plans used for the following queries will be written to derby.log. This will show what indexes was used by the optimizer. Also see http://db.apache.org/derby/docs/10.1/tuning/rtunproper43414.html . Query with 'order by' will require sorting. Usually, sorting requires an extra step to put the data into the right order. This extra step can be avoided for data that are already in the right order. For example, if a single-table query has an ORDER BY on a single column, and there is an index on that column, sorting can be avoided if Derby uses the index as the access path. I think in case of your first and third query the optimizer will pick the available index thus probably avoiding requiring the sort step. Your second query involves more work than the first query, since it has a search condition on time, and an order by order_id. Thus if the optimizer picks the index on time, that will involve a sort step on order_id. ____________ Thanks, Sunitha.