[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

A B (JIRA) Fri, 07 Nov 2008 08:41:11 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645816#action_12645816
 ]


A B commented on DERBY-581:
---------------------------

> Is there actually a way to order and limit at the same time? There does not 
> seem to be.
>> I'm afraid there's no way to do that currently.

Well, there is *one* way to do this, at least for simple queries, though it's 
far from intuitive.  I don't know if this really usable for your environment, 
but note the following quote from a comment on DERBY-2998:

"[The rows returned from the subquery have no guaranteed ordering (Derby 
doesn't allow ORDER BY in subqueries), and thus any predicate which restricts 
based on row_number() will restrict the rows based on an *undefined* order. 
Since the order of the rows from the subquery may depend on the presence of 
indexes, the set of rows which survives a row_order()-based restriction may 
depend on the indexes, as well. In the end I do _not_ think this is a bug--but 
it does strike me as a probable point of confusion for users. It seems that 
anyone who wants "the first x rows only" has to either accept the fact that 
"first" does not imply "ordered" (and thus results can vary depending on what 
conglomerate the optimizer chooses), or else s/he has to use optimizer 
ovverides to force the optimizer to use an index which is ordered on the 
desired columns."

So given that, I think you could try something like:

create table testing (i int, title varchar(40), author varchar(20));
insert into testing values (5, 'title_5', 'author_5');
insert into testing values (4, 'title_4', 'author_4');
insert into testing values (1, 'title_1', 'inigo');
insert into testing values (3, 'title_3', 'author_3');
insert into testing values (2, 'title_2', 'montoya');

-- Your original query; subquery results aren't ordered so you don't know
-- which rows you'll actually get back.

SELECT * FROM (
  SELECT ROW_NUMBER() OVER () as rownum, testing.*
    FROM testing
    WHERE title is not null
  ) AS tmp
  WHERE rownum >= 2 and rownum < 4;

ROWNUM              |I          |TITLE                         |AUTHOR
-----------------------------------------------------------------------------
2                   |4          |title_4                       |author_4
3                   |1          |title_1                       |inigo

-- Create an index whose ordering matches the ordering you want for your 
subquery.
create index ix_title_asc on testing(title asc);

-- Now use optimizer overrides to force the index for the subquery.  That will 
force
-- the subquery results to come back in index order, which then means your limit
-- using row_number gives you predictable results.

SELECT * FROM (
  SELECT ROW_NUMBER() OVER () as rownum, testing.*
    FROM testing --DERBY-PROPERTIES index=ix_title_asc
    WHERE title is not null
  ) AS tmp
  WHERE rownum >= 2 and rownum < 4;

ROWNUM              |I          |TITLE                         |AUTHOR
-----------------------------------------------------------------------------
2                   |2          |title_2                       |montoya
3                   |3          |title_3                       |author_3

As I said, it's not at all intuitive and it's probably not a viable option for 
complicated queries.  But I thought I'd post it just in case it proves useful...

> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
>                 Key: DERBY-581
>                 URL: https://issues.apache.org/jira/browse/DERBY-581
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>         Environment: All
>            Reporter: Craig Russell
>            Assignee: Bryan Pendleton
>            Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query 
> optimized and execution strategy can know what the user needs in terms of 
> cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner 
> we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind 
> of functions on the JDBC side may not be sufficient since it is called after 
> SQL statement is prepared and returned as an object (after query plan is 
> built). Therefore, it may be necessary to have language syntax to indicate 
> the intention to fetch first 1000 rows only, so that when the query is 
> prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <[EMAIL PROTECTED]> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> > 
> > 
> > 
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain. 
> > 
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics. 
> > 
> > Here are the particulars:
> > 
> > ---------------------------------
> > 
> > create table orders(
> > 
> > order_id varchar(50) NOT NULL
> > 
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> > 
> > amount numeric(31,2),
> > 
> > time date,
> > 
> > inv_num varchar(50),
> > 
> > line_num varchar(50),
> > 
> > phone varchar(50),
> > 
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > > 
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> > 
> > create index IX_ORDERS_TIME on orders(time);
> > > 
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >> 
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'. 
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds. 
> >>
> >> 
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-581) Modify SQL to skip N rows of the result and return the next M rows

Reply via email to