Daniel Noll <[EMAIL PROTECTED]> writes: > Hi all. > > I've been trying to determine the cause for a slowness in a certain query. > I've managed to strip everything away and get to the core of what's making it > slow, but I'd like to understand why. > > The query is of this form: > > SELECT jobs.id AS id, > jobs.name AS name, > (SELECT COUNT(1) FROM jobitems > WHERE jobitems.jobid = jobs.id) AS count > FROM jobs > ORDER BY jobs.name > > This takes around 3000ms to complete if there are 200,000 rows in jobitems. > > If I remove the "count" subquery, then it runs in 1ms. If I remove the ORDER > BY, it runs in 1ms. > > There is only one row in jobs, thus I would expect the same performance > whether I order it or not. The PK on jobitems is (jobid,itemid); prior posts > on this forum have led me to believe that this creates an implicit UNIQUE > INDEX on (jobid,itemid) and also an implicit INDEX on (jobid). Whether this > is the case or not, creating an additional INDEX on (jobid) itself doesn't > speed things up. > > If I remove the count part and then execute it by itself, I get a total query > time of 3ms for the two queries. No nested query should take longer than it > takes to execute its parts separately. > > I discovered this issue on version 10.3.1.4 but it still performs the > same way under version 10.3.2.1.
It sounds like the optimizer is making a bad decision for some reason, but I don't know why (perhaps the statistics are out of date? See http://thread.gmane.org/gmane.comp.apache.db.derby.user/8100/focus=8101). Since you seem to have a small reproducible test case for this problem, it would be great if you could file a JIRA issue and upload one sql script that creates the tables/indices and insert enough sample data to show the problem, and one sql script that runs the different queries. > The obvious workaround is not to order by name and sort the results on the > Java side. But is there a way to rewrite this query that will make Derby > happier? I don't know if it makes Derby happier, but I think this query does the same thing without the nested query: SELECT jobs.id, jobs.name, count(*) FROM jobs, jobitems WHERE jobs.id = jobitems.jobid GROUP BY jobs.id, jobs.name ORDER BY jobs.name -- Knut Anders
