Re: FW: Advice on very badly performing query

Matt Doran Mon, 03 Dec 2007 14:34:14 -0800

Hi Michael,

Michael Segel wrote:

The short simple answer... You get what you pay for.
The longer answer... Query optimization is a black art. Cloudscape wasdesigned as a lightweight no frills embeddable DB.
Now you Cloudscape morphed in to Derby and JavaDB. But you lose theinput from the folks at IBM who handle Query Optimization.

But I'm more than happy to work with the constraints of Derby, if only Icould understand them. And that's the help I was looking for here.

I have to run to a customer site, but using one of your examples...you noticed that the query performance changed when you had the fieldin the select columns as well as the where clause, but you didn't whenyou had the field just in the where clause. So keep it in the selectedfields. You could also try and change the order of the tables you'rejoining.

I didn't change the where clause, just changing the select fields causesthe dramatic query plan change. I have a feeling it might be the factthat I'm selecting an attribute from the 5th join table ... but I'd likea better understanding of what's triggering the change so I can avoid itif possible.

And you may want to reflect your 5 table join. Depending on thedatabase and its tuning. Joining more than 3-4 tables can have adrastic negative impact on its performance.

I'm not sure what you mean to "reflect your 5 table join"?The fundamental issue here is that in this poor performing case, derbyis not looking at the index on the very large table that wouldimmediately reduce the dataset. For whatever reason the optimizer ismaking a the worst possible case decision.

And if speed really is important look at Informix (IDS 11) nowoffered by IBM.

Unfortunately as an off the shelf Java application that runs on Windows,Mac and Linux ... we really need a simple embedded DB that we can shipas the default. Unfortunately Derby's query optimizer let's it downbadly sometimes.

HTH
------------------------------------------------------------------------

*From:* Matt Doran [mailto:[EMAIL PROTECTED]
*Sent:* Sunday, December 02, 2007 11:06 PM
*To:* [email protected]
*Subject:* Advice on *very* badly performing query (with reproductionrecipe)
Hi there,
We use Apache Derby in our commercial application, PaperCut NG<http://www.papercut.com/>. It's proven to be very reliable, howeverwe occasionally get reports of very bad performance in some areas. Wehaven't had the time to investigate them fully previously (usuallyupgrading to an external DB like Postgres or SQL Server fixes theissue). This time we had a look in more detail with a recent report,and we've found some very strange performance characteristics ... andwould love some advice and assistance.
We have a query that is doing inner joins to 5 tables. It's quite asimple query, but the core table has about 300,000 rows, and wherelimiting the results based on a date in that table that is indexed.Here's a summary of my situation/findings:
    * Using the latest Derby release 10.3.1.4, with a Java 1.5 VM on
      Windows.
    * We only have a single WHERE clause, which is on the indexed date
field is restricting the data such that no data is returned.e.g. log_date > (latest log date). So derby should quickly
      detect there is litte/no data to return.
    * Running the original query takes 22 minutes running 100% CPU.
    * Running a count(*) for the same query is quick (< 1 sec).
    * Removing the ORDER BY and changing the select list to just
      include a single field from each table and it still takes 22
      minutes.
    * Changing the select list to retrieve only a single field from 2
      of the table and it still takes 22 minutes (I have a log of the
      query and the runtime stats for this attached "derby-slow.log").
    * Changing the select list to a single field from 1 of the tables
      makes the query run fast - less than a second. (I have a log of
      the query and the runtime stats for this attached "derby-fast.log").
    * Running the original query on the same dataset in PostgreSQL or
      SQL Server is very fast (less than a second).  This is why we
      often recommend customers upsize to Postgres or SQL Server.
    * Also the SQL is generated via Hibernate ORM, so we have some
      limitations in how we can modify the SQL.
From the query plan it seems that seems that it stops using the dateindex on the "tbl_printer_usage_log" log table, and changes from Hashjoins to Nested Loop joins. On a large table like this when providinga where clause that on a field that is indexed .... we have to ensurethat derby uses the index.
If I increase the pageCacheSize to 100,000 pages, it reduces the timeof the query to about 2-3 minutes, but it's still very slow comparedto when the correct index is used.
Can anyone please help me understand the following:

    * Why does the query plan change dramatically, just by changing
      the fields that are retrieved?
    * Why is derby avoiding the most obvious index?  The date field in
      the 300,000 row table (the date field is the only field in our
      where clause).
    * Is there anyway to avoid this behavior?
If we can understand what's causing this, we'll be able to make a muchmore effective use of Derby. At the moment, on customers with largedatasets, we currently just recommend they "upsize" to Postgres or SQLServer and the problem goes away. However, we'd much prefer to fixthis and have our Derby database behave better.
I'd be happy to provide the derby database that exhibits theseproblems if someone would like to see what's going on. The databaseis from a customer, so I don't want to post it publicly, but if yousend me an email off-list I'd be happy to provide it.
Regards.

--
Matt Doran
PaperCut Software International Pty. Ltd.
Phone:   +61 (3) 9807 5767
E-mail:  [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Profile: http://www.papercut.com/about/#matt
Blog:    http://www.papercut.com/blog/

Re: FW: Advice on *very* badly performing query

Reply via email to

Re: FW: Advice on very badly performing query