Derby 10.4 now supports the SQL windowing so that for some queries, the JPA result set range limits can be implemented via SQL instead of JDBC. Should improve performance for some queries.

Craig

Begin forwarded message:

From: "Rick Hillegas (JIRA)" <[email protected]>
Date: April 6, 2009 7:07:13 AM PDT
To: [email protected]
Subject: [jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows


[ https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696097 #action_12696097 ]

Rick Hillegas commented on DERBY-581:
-------------------------------------

Hi Knut,

I think there's more work which could be done to improve our FETCH/ OFFSET support (such as adding ORDER BY to subselects), but I agree that this particular issue can be closed. Thanks.

Modify SQL to skip N rows of the result and return the next M rows
------------------------------------------------------------------

               Key: DERBY-581
               URL: https://issues.apache.org/jira/browse/DERBY-581
           Project: Derby
        Issue Type: New Feature
        Components: SQL
       Environment: All
          Reporter: Craig Russell
          Assignee: Bryan Pendleton
          Priority: Minor

I agree that the information should be expressed in SQL so that the query optimized and execution strategy can know what the user needs in terms of cardinality. I'd also like to ask that when we consider extending the SQL in this manner we consider skipping the first N rows and returning the next M rows.
Craig
On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
Regards,
Ali
Mike Matrigali <[email protected]> wrote:
As craig points out it is important in performance testing to say
exactly what you are measuring. In general Derby will try to
stream rows to the user before it has finished looking at all rows.
So often looking at the first row will and stopping will mean that
many rows have not been processed. BUT when an order by is involved
and the query plan either has no appropriate matching index, or decides to use a different index then all the rows are processed, then they are
sent to the sorter and finally after all rows are processed they are
streamed to the client.
So as you have seen reading the first 1000 rows of a much larger data
set can happen very quickly.
As subsequent mail threads have pointed out, returning the top 1000
sorted rows is an interesting problem which could be costed and executed
differently if that information was pushed into the optimizer and the
sorter (and medium level projects were done in those areas).
On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:



I have observed some interesting query performance behavior and am
hoping someone here can explain.

In my scenario, it appears that an existing index is not being used for the 'order by' part of the operation and as a result the perfo rmance of certain queries is suffering. Can someone explain if this is supposed
to be what is happening and why? Please see below for the specific
queries and their performance characteristics.

Here are the particulars:

---------------------------------

create table orders(

order_id varchar(50) NOT NULL

CONSTRAINT ORDERS_PK PRIMARY KEY,

amount numeric(31,2),

time date,

inv_num varchar(50),

line_num varchar(50),

phone varchar(50),

prod_num varchar(50));
--Load a large amount of data (720,000 records) into the 'orders' table

--Create an index on the time column as that will be used i n the 'where'
clause.

create index IX_ORDERS_TIME on orders(time);

--When I run a query against this table returning top 1,000 records,
this query returns very quickly, consistently less than .010 seconds.


select * from orders

where time > '10/01/2002' and time < '11/30/2002'

order by time;

--Now run a similarly query against same table, returning the top
1,000 records.

--The difference is that the results are now sorted by the primary key
('order_id') rather than 'time'.

--This query returns slowly, approximately 15 seconds. Why??

select * from orders

where time > '10/01/2002' and time < '11/30/2002'

order by order_id;

--Now run a third query against the same 'orders' table, removing the
where clause

--This query returns quickly, around .010 seconds.



select * from orders

order by order_id;


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Craig L Russell
Architect, Sun Java Enterprise System http://db.apache.org/jdo
408 276-5638 mailto:[email protected]
P.S. A good JDO? O, Gasp!

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to