AW: Derby Scheduler and FETCH FIRST question

g.hohl Fri, 26 Jan 2018 00:51:25 -0800

Hello Rick,

me again. As we had a blackout of our Internet connection I had some
unexpected time. ;-)

So I tried the analysis of Derby:

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1)

SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY

VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS()

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(0)

The result of the "VALUES" command was:

> 1                                                   

> ----------------------------------------------------

> Statement Name: \n null\nStatement Text: \n            SELECT  

Not quite what I would have expected. But then I read on a different
page of the documentation that I have to declare where the information
of the analysis should be stored.

So I did ('myapp' is the schema of my app):

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)

CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1)

CALL SYSCS_UTIL.SYSCS_SET_XPLAIN_SCHEMA('myapp')

Executing that command I get:

> ExampleExceptionFormatter: exception message was:
'SYSCS_UTIL.SYSCS_SET_XPLAIN_SCHEMA' is not recognized as a function or
procedure.

I have a Derby 10.14 and took that command from the documentation of
that version. So I'm not sure why it gives me this error message.

Regards,

Gerrit

Von: Hohl, Gerrit 
Gesendet: Freitag, 26. Januar 2018 09:11
An: 'Derby Discussion'
Betreff: AW: Derby Scheduler and FETCH FIRST question

Hello Rick,

and thanks for your reply.

I will try what you've written as soon as I have time for it.

Unfortunately I already switch to another project and I don't know when
I will get the time to have a look on this again (blame my superiors ;-)
).

2) is surely a good idea.

1) Did you also read my 2nd mail?

I also tried using a subselect, so I have a WHERE clause. I had the same
idea as you that the scheduler might not recognize the ORDER BY and
FETCH FIRST.

It was faster, but still not what I would have expected. I've worked a
lot with Borland Interbase / Firebird, MySQL and especially with
PostgreSQL.

And PostgreSQL would have done a lot faster than this.

By the way: PostgreSQL also has a more easy to use approach in aspect of
analysis: https://www.postgresql.org/docs/9.6/static/using-explain.html

Would be great of Derby would offer something similar. 

Regards,

Gerrit

Von: Rick Hillegas [mailto:[email protected]] 
Gesendet: Freitag, 26. Januar 2018 00:39
An: [email protected]
Betreff: Re: Derby Scheduler and FETCH FIRST question

On 1/24/18 4:45 AM, [email protected] wrote:

        Hello everyone,

        I'm using Apache Derby v10.14.1.0 and having some problems using
the FETCH FIRST clauses.

https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html

        I'm accessing the database using the Derby Embedded driver.

        I have a table which contains some indexes as well as some
fields and a BLOB field. The table is somewhat big (means many rows, ~13
GB).

        I'm using a query like this (timestamp has an index):

        SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10
ROWS ONLY

        The query takes ages (about 27 minutes for that ~13 GB table)
and I can see how Derby slowly fills up my harddisk.

        And a look in the "tmp" folder of the database shows several
".tmp" files.

        First I get several files having 10 MB, then I get two big files
having 5 GB, then the 10 MB files are deleted, then the 5 GB files are
deleted and finally I get the result.

        As I thought something is wrong with my application I also did
the same query on the same database and table using SQuirreL v3.8.1. But
the result is the same.

        I would have expected that the scheduler of Derby would first
look at the timestamp column / index (which should be sorted), taking
the first 10 values from there and

        finally reading the first 10 rows matching these values.

        Instead it seems that it first processes the " SELECT * FROM
history" part (as memory is not sufficient it swaps it to the harddisk),
orders it and takes the first 10 elements.

        Is that correct?

        And if that is correct, where is the benefit of FETCH FIRST -
beside that maybe not that much data is transferred (maybe only
interesting if you use Derby not by the Embedded Driver because of the
TCP/IP connection)?

        Regards,

        Gerrit

Hi Gerrit,

Can you share table and index DDL for this problem as well as the query
plan which Derby chose for the query? See the section on "Working with
RunTimeStatistics" in the Derby Tuning Guide:
http://db.apache.org/derby/docs/10.14/tuning/index.html

It may be that Derby did not choose the index. That in turn, may have
happened for 2 reasons:

1) You're selecting all of the columns in the table and there is no
filtering WHERE clause. That reduces the likelihood that Derby will pick
an indexed access path since the optimizer sees this as a full table
scan.

2) I don't think that any optimizer support was built for the FETCH
FIRST clause. That's worth filing a performance bug for. I think that
the FETCH FIRST clause is only applied at execution time in order to
short-circuit the number of rows which are returned.

Thanks,

-Rick

AW: Derby Scheduler and FETCH FIRST question

Reply via email to