Re: Bad performance with UPDATE on a nested SELECT

Mikkel Kamstrup Erlandsen Tue, 13 Jan 2009 07:16:43 -0800

On Tue, 2009-01-13 at 11:59 +0100, Knut Anders Hatlen wrote:
> Mikkel Kamstrup Erlandsen <[email protected]> writes:
> 
> > On Mon, 2009-01-12 at 13:49 +0100, Knut Anders Hatlen wrote:
> >> Mikkel Kamstrup Erlandsen <[email protected]> writes:
> >> 
> >> > Hi list,
> >> >
> >> > I am seeing some bad performance on an UPDATE on a nested SELECT. The
> >> > query in case takes about 30s, but I think it should be a lot faster
> >> > because the same query on a PostgresQL runs < 1 ms... Anyway, here's the
> >> > setup:
> >> >
> >> > I have two tables 'records' and 'relations'.
> >> >
> >> > The 'records' table stores a bunch of records along with some metadata.
> >> > It has a unique index 'i' on the 'id' column and a normal index 'm' on
> >> > the  'mtime' column. This table holds about 1.5M rows.
> >> >
> >> > The 'relations' table stores parent/child relations between records and
> >> > has two rows 'parentId' and 'childId'. There is a unique index 'pc' on
> >> > (parentId,childId) and a normal index 'c' on childId. This table holds
> >> > about 35k rows.
> >> 
> >> Hi Mikkel,
> >> 
> >> Derby sometimes has problems with the cardinality statistics getting
> >> stale in non-unique indexes, which may lead to bad decisions being made
> >> by the optimizer. Do you still see the problem if you compress the
> >> relations table (alternatively, you could drop the index on childId and
> >> recreate it)? If this speeds up the execution, you're probably
> >> experiencing these issues:
> >> 
> >> https://issues.apache.org/jira/browse/DERBY-269
> >> https://issues.apache.org/jira/browse/DERBY-3788
> >> 
> >
> > Hi Knut,
> >
> > I just tried compressing both the 'relations' and the 'records' table.
> > This makes the query run in about 20s. So a small win, but nothing
> > significant.
> 
> Right, that suggests that you're seeing a different problem. Just to be
> sure that the old query plan isn't still lying around in the statement
> cache, it might be worth checking if clearing the statement cache
> changes anything:
> 
>   CALL SYSCS_UTIL.SYSCS_EMPTY_STATEMENT_CACHE()


That didn't change anything...

> 1) The IN clause is rewritten to a join with SUMMA_RECORDS as the outer
> table, and we have no information to limit the scan of the outer table,
> hence a full table scan is performed. It would probably have been better
> if SUMMA_RECORDS were the inner table. Then the C index could be used to
> enforce the restriction (childId='...') on the outer table,
> SUMMA_RELATIONS, and index I could be used to perform lookups in the
> inner table.
> 
> 2) The index scan on SUMMA_RELATIONS uses the index PC. Since that index
> is on the columns (parentId, childId), it is primarily sorted on
> parentId, so the scan needs to go through the entire index, for each row
> in the outer table, in order to match childId with the criterion in the
> WHERE clause of the nested select. It would probably be better to use
> the index on childId (index C) instead of PC (or reverse the order of
> the columns in the PC index).

I just tried dropping the PC index (as suggested by Mike in another
thread) and replacing it with a non-unique index P only on the parentId
column of the relations table. This did not have any significant impact
on execution time (still ~20s).

I attached query plan and dblook dumps to
https://issues.apache.org/jira/browse/DERBY-4007 for anybody
interested...

> Derby allows you to override some of the optimizer's decisions, but it's
> not as easy for UPDATE statements as it is for SELECT statements, so I
> don't know how to tell it not to pick the table scan. Using the C index
> instead of the PC index should be easy enough, though:
> 
> UPDATE summa_records SET base='my_base' WHERE id IN
>   (SELECT parentId FROM summa_relations --DERBY-PROPERTIES index=C
>       WHERE childId='horizon_2332668')

Hmmm... I just get:
ERROR 42X41: In the Properties clause of a FROM list, the property
'index' is not valid (the property was being set to 'C').

Does this require Derby trunk or something?

Cheers,
Mikkel

Re: Bad performance with UPDATE on a nested SELECT

Reply via email to