Re: Performance issue of derby using JDBC

Rick Hillegas Wed, 17 Dec 2008 10:09:33 -0800

Hi Harshad,

Some comments inline...


Harshad wrote:

Rick Hillegas wrote:

Depending on your data, the two queries could return different results,
so Bryan's faster query is not a transformation which Derby would

perform automatically.


Oh yes! Although my data ensures that the subquery would return distinct 
results, Derby wouldn't know about it. Is there a way to say that the 
combination of two columns is unique?

When I tried to specify DISTINCT explicitly like this:
select name,version,release,time from PKG where PKG.id in (select DISTINCT 
PROVIDES.id from PROVIDES where (PROVIDES.name = ?))

...I didn't get any change in the performance figures. That is probably because of the 
"DISTINCT elimination in IN sub-queries" mentioned in "Tuning derby".

I would have thought that the DISTINCT is not redundant and actually allows for 
the transformation into a normal join.

I think that, for instance, a problem arises when there are M tuples inPROVIDES with name='Fred' but there are < M distinct ids in thosetuples. In that case, you can imagine data sets on which the originalquery returns < M tuples but the transformed query returns M tuples.

Nevertheless, it looks to me as though your
original query almost qualifies for the EXISTS transformation documented

in the Derby Tuning Guide.


Why do you say "almost qualifies"; where is it not qualifying? And, will the 
EXISTS transformation speed up the query as much as the transformation to a normal join?

I haven't looked under the hood here. The Tuning Guide doesn'tcompletely describe the optimizer's reasoning. A smart optimizer won'tunconditionally flatten subqueries.

Without having looked at the code (but knowing who wrote it), I suspectthat the EXISTS transformation may be a normal join combined with atuple filter which throws away duplicates on a sorted leg of the join.

It may be that Derby did apply the EXISTS transformation--and thatdegraded rather than improved the performance. If you are interested inseeing the plan which Derby selected, please consult the following wikipages:


 http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
 http://wiki.apache.org/db-derby/StmtExecutionPlan

I realize that I'm raising more questions than I'm answering.

Hope this is a little useful,
-Rick

Thanks to all who have responded on this thread.

I have learnt not to trust the execution time of Statement.executeQuery(). It 
looks like the actual table access is made lazily (as and when the result set 
is accessed).

Re: Performance issue of derby using JDBC

Reply via email to