Hi Harshad,

Some comments inline...

Harshad wrote:
Rick Hillegas wrote:
Depending on your data, the two queries could return different results,
so Bryan's faster query is not a transformation which Derby would
perform automatically.

Oh yes! Although my data ensures that the subquery would return distinct 
results, Derby wouldn't know about it. Is there a way to say that the 
combination of two columns is unique?

When I tried to specify DISTINCT explicitly like this:
select name,version,release,time from PKG where PKG.id in (select DISTINCT 
PROVIDES.id from PROVIDES where (PROVIDES.name = ?))

...I didn't get any change in the performance figures. That is probably because of the 
"DISTINCT elimination in IN sub-queries" mentioned in "Tuning derby".

I would have thought that the DISTINCT is not redundant and actually allows for 
the transformation into a normal join.
I think that, for instance, a problem arises when there are M tuples in PROVIDES with name='Fred' but there are < M distinct ids in those tuples. In that case, you can imagine data sets on which the original query returns < M tuples but the transformed query returns M tuples.
Nevertheless, it looks to me as though your
original query almost qualifies for the EXISTS transformation documented
in the Derby Tuning Guide.

Why do you say "almost qualifies"; where is it not qualifying? And, will the 
EXISTS transformation speed up the query as much as the transformation to a normal join?
I haven't looked under the hood here. The Tuning Guide doesn't completely describe the optimizer's reasoning. A smart optimizer won't unconditionally flatten subqueries.

Without having looked at the code (but knowing who wrote it), I suspect that the EXISTS transformation may be a normal join combined with a tuple filter which throws away duplicates on a sorted leg of the join.

It may be that Derby did apply the EXISTS transformation--and that degraded rather than improved the performance. If you are interested in seeing the plan which Derby selected, please consult the following wiki pages:

 http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
 http://wiki.apache.org/db-derby/StmtExecutionPlan

I realize that I'm raising more questions than I'm answering.

Hope this is a little useful,
-Rick
Thanks to all who have responded on this thread.

I have learnt not to trust the execution time of Statement.executeQuery(). It 
looks like the actual table access is made lazily (as and when the result set 
is accessed).


Reply via email to