[
https://issues.apache.org/jira/browse/DERBY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645816#action_12645816
]
A B commented on DERBY-581:
---------------------------
> Is there actually a way to order and limit at the same time? There does not
> seem to be.
>> I'm afraid there's no way to do that currently.
Well, there is *one* way to do this, at least for simple queries, though it's
far from intuitive. I don't know if this really usable for your environment,
but note the following quote from a comment on DERBY-2998:
"[The rows returned from the subquery have no guaranteed ordering (Derby
doesn't allow ORDER BY in subqueries), and thus any predicate which restricts
based on row_number() will restrict the rows based on an *undefined* order.
Since the order of the rows from the subquery may depend on the presence of
indexes, the set of rows which survives a row_order()-based restriction may
depend on the indexes, as well. In the end I do _not_ think this is a bug--but
it does strike me as a probable point of confusion for users. It seems that
anyone who wants "the first x rows only" has to either accept the fact that
"first" does not imply "ordered" (and thus results can vary depending on what
conglomerate the optimizer chooses), or else s/he has to use optimizer
ovverides to force the optimizer to use an index which is ordered on the
desired columns."
So given that, I think you could try something like:
create table testing (i int, title varchar(40), author varchar(20));
insert into testing values (5, 'title_5', 'author_5');
insert into testing values (4, 'title_4', 'author_4');
insert into testing values (1, 'title_1', 'inigo');
insert into testing values (3, 'title_3', 'author_3');
insert into testing values (2, 'title_2', 'montoya');
-- Your original query; subquery results aren't ordered so you don't know
-- which rows you'll actually get back.
SELECT * FROM (
SELECT ROW_NUMBER() OVER () as rownum, testing.*
FROM testing
WHERE title is not null
) AS tmp
WHERE rownum >= 2 and rownum < 4;
ROWNUM |I |TITLE |AUTHOR
-----------------------------------------------------------------------------
2 |4 |title_4 |author_4
3 |1 |title_1 |inigo
-- Create an index whose ordering matches the ordering you want for your
subquery.
create index ix_title_asc on testing(title asc);
-- Now use optimizer overrides to force the index for the subquery. That will
force
-- the subquery results to come back in index order, which then means your limit
-- using row_number gives you predictable results.
SELECT * FROM (
SELECT ROW_NUMBER() OVER () as rownum, testing.*
FROM testing --DERBY-PROPERTIES index=ix_title_asc
WHERE title is not null
) AS tmp
WHERE rownum >= 2 and rownum < 4;
ROWNUM |I |TITLE |AUTHOR
-----------------------------------------------------------------------------
2 |2 |title_2 |montoya
3 |3 |title_3 |author_3
As I said, it's not at all intuitive and it's probably not a viable option for
complicated queries. But I thought I'd post it just in case it proves useful...
> Modify SQL to skip N rows of the result and return the next M rows
> ------------------------------------------------------------------
>
> Key: DERBY-581
> URL: https://issues.apache.org/jira/browse/DERBY-581
> Project: Derby
> Issue Type: New Feature
> Components: SQL
> Environment: All
> Reporter: Craig Russell
> Assignee: Bryan Pendleton
> Priority: Minor
>
> I agree that the information should be expressed in SQL so that the query
> optimized and execution strategy can know what the user needs in terms of
> cardinality.
> I'd also like to ask that when we consider extending the SQL in this manner
> we consider skipping the first N rows and returning the next M rows.
> Craig
> On Sep 20, 2005, at 10:19 AM, Suavi Ali Demir wrote:
> Another little detail about optimization is that Statement.setMaxRows() kind
> of functions on the JDBC side may not be sufficient since it is called after
> SQL statement is prepared and returned as an object (after query plan is
> built). Therefore, it may be necessary to have language syntax to indicate
> the intention to fetch first 1000 rows only, so that when the query is
> prepared, this intention can be taken into account.
> Regards,
> Ali
> Mike Matrigali <[EMAIL PROTECTED]> wrote:
> As craig points out it is important in performance testing to say
> exactly what you are measuring. In general Derby will try to
> stream rows to the user before it has finished looking at all rows.
> So often looking at the first row will and stopping will mean that
> many rows have not been processed. BUT when an order by is involved
> and the query plan either has no appropriate matching index, or decides
> to use a different index then all the rows are processed, then they are
> sent to the sorter and finally after all rows are processed they are
> streamed to the client.
> So as you have seen reading the first 1000 rows of a much larger data
> set can happen very quickly.
> As subsequent mail threads have pointed out, returning the top 1000
> sorted rows is an interesting problem which could be costed and executed
> differently if that information was pushed into the optimizer and the
> sorter (and medium level projects were done in those areas).
> > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
> >
> >
> >
> > I have observed some interesting query performance behavior and am
> > hoping someone here can explain.
> >
> > In my scenario, it appears that an existing index is not being used for
> > the 'order by' part of the operation and as a result the perfo rmance of
> > certain queries is suffering. Can someone explain if this is supposed
> > to be what is happening and why? Please see below for the specific
> > queries and their performance characteristics.
> >
> > Here are the particulars:
> >
> > ---------------------------------
> >
> > create table orders(
> >
> > order_id varchar(50) NOT NULL
> >
> > CONSTRAINT ORDERS_PK PRIMARY KEY,
> >
> > amount numeric(31,2),
> >
> > time date,
> >
> > inv_num varchar(50),
> >
> > line_num varchar(50),
> >
> > phone varchar(50),
> >
> > prod_num varchar(50));
> > > --Load a large amount of data (720,000 records) into the 'orders' table
> > >
> > --Create an index on the time column as that will be used i n the 'where'
> > clause.
> >
> > create index IX_ORDERS_TIME on orders(time);
> > >
> > --When I run a query against this table returning top 1,000 records,
> > this query returns very quickly, consistently less than .010 seconds.
> >>
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by time;
> >>
> >> --Now run a similarly query against same table, returning the top
> >> 1,000 records.
> >>
> >> --The difference is that the results are now sorted by the primary key
> >> ('order_id') rather than 'time'.
> >>
> >> --This query returns slowly, approximately 15 seconds. Why??
> >>
> >> select * from orders
> >>
> >> where time > '10/01/2002' and time < '11/30/2002'
> >>
> >> order by order_id;
> >>
> >> --Now run a third query against the same 'orders' table, removing the
> >> where clause
> >>
> >> --This query returns quickly, around .010 seconds.
> >>
> >>
> >>
> >> select * from orders
> >>
> >> order by order_id;
> >>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.