On Tue, 2009-01-13 at 11:59 +0100, Knut Anders Hatlen wrote: > Mikkel Kamstrup Erlandsen <[email protected]> writes: > > > On Mon, 2009-01-12 at 13:49 +0100, Knut Anders Hatlen wrote: > >> Mikkel Kamstrup Erlandsen <[email protected]> writes: > >> > >> > Hi list, > >> > > >> > I am seeing some bad performance on an UPDATE on a nested SELECT. The > >> > query in case takes about 30s, but I think it should be a lot faster > >> > because the same query on a PostgresQL runs < 1 ms... Anyway, here's the > >> > setup: > >> > > >> > I have two tables 'records' and 'relations'. > >> > > >> > The 'records' table stores a bunch of records along with some metadata. > >> > It has a unique index 'i' on the 'id' column and a normal index 'm' on > >> > the 'mtime' column. This table holds about 1.5M rows. > >> > > >> > The 'relations' table stores parent/child relations between records and > >> > has two rows 'parentId' and 'childId'. There is a unique index 'pc' on > >> > (parentId,childId) and a normal index 'c' on childId. This table holds > >> > about 35k rows. > >> > >> Hi Mikkel, > >> > >> Derby sometimes has problems with the cardinality statistics getting > >> stale in non-unique indexes, which may lead to bad decisions being made > >> by the optimizer. Do you still see the problem if you compress the > >> relations table (alternatively, you could drop the index on childId and > >> recreate it)? If this speeds up the execution, you're probably > >> experiencing these issues: > >> > >> https://issues.apache.org/jira/browse/DERBY-269 > >> https://issues.apache.org/jira/browse/DERBY-3788 > >> > > > > Hi Knut, > > > > I just tried compressing both the 'relations' and the 'records' table. > > This makes the query run in about 20s. So a small win, but nothing > > significant. > > Right, that suggests that you're seeing a different problem. Just to be > sure that the old query plan isn't still lying around in the statement > cache, it might be worth checking if clearing the statement cache > changes anything: > > CALL SYSCS_UTIL.SYSCS_EMPTY_STATEMENT_CACHE()
That didn't change anything... > 1) The IN clause is rewritten to a join with SUMMA_RECORDS as the outer > table, and we have no information to limit the scan of the outer table, > hence a full table scan is performed. It would probably have been better > if SUMMA_RECORDS were the inner table. Then the C index could be used to > enforce the restriction (childId='...') on the outer table, > SUMMA_RELATIONS, and index I could be used to perform lookups in the > inner table. > > 2) The index scan on SUMMA_RELATIONS uses the index PC. Since that index > is on the columns (parentId, childId), it is primarily sorted on > parentId, so the scan needs to go through the entire index, for each row > in the outer table, in order to match childId with the criterion in the > WHERE clause of the nested select. It would probably be better to use > the index on childId (index C) instead of PC (or reverse the order of > the columns in the PC index). I just tried dropping the PC index (as suggested by Mike in another thread) and replacing it with a non-unique index P only on the parentId column of the relations table. This did not have any significant impact on execution time (still ~20s). I attached query plan and dblook dumps to https://issues.apache.org/jira/browse/DERBY-4007 for anybody interested... > Derby allows you to override some of the optimizer's decisions, but it's > not as easy for UPDATE statements as it is for SELECT statements, so I > don't know how to tell it not to pick the table scan. Using the C index > instead of the PC index should be easy enough, though: > > UPDATE summa_records SET base='my_base' WHERE id IN > (SELECT parentId FROM summa_relations --DERBY-PROPERTIES index=C > WHERE childId='horizon_2332668') Hmmm... I just get: ERROR 42X41: In the Properties clause of a FROM list, the property 'index' is not valid (the property was being set to 'C'). Does this require Derby trunk or something? Cheers, Mikkel
