Hi Mike, > What data type are you using for 'child_id' and 'parent_id'? > It seems that the way you've written the query that they are character > strings and not integers. > (Not that it would cause the delay.)
They are VARCHAR(255). > If you run the inner select by itself as written, how long does it take? About 1ms. > There was a bug several releases ago concerning the IN clause performance, > this could be related. This problem is on 10.4.2 > It would not be a good thing if you are doing a sequential scan on the > entire 1.5 mil rows, which is what you could be seeing in the 30 sec run. > > Another test that you can use to confirm the problem is to create a subset > of the records table with 10,000 rows. If the same query runs with a low > (sub second) response, then you've verified that you're doing a table scan. I just tried on a base half the size (~700k rows) and it runs in about 5-10s. The really creepy about this case is that my test run was a case where the nested SELECT was empty!.. > Assuming that this is a bug, and assuming that your id's are character > strings and not integers, try creating them as integers and see if that > helps the IN clause performance. (It's a gut feel, a hunch). This will be a substantial amount of work and will not be acceptable as an end solution, so I will prefer not going down that path... Cheers, Mikkel > > -----Original Message----- > > From: Mikkel Kamstrup Erlandsen [mailto:[email protected]] > > Sent: Monday, January 12, 2009 6:00 AM > > To: [email protected] > > Subject: Bad performance with UPDATE on a nested SELECT > > > > Hi list, > > > > I am seeing some bad performance on an UPDATE on a nested SELECT. The > > query in case takes about 30s, but I think it should be a lot faster > > because the same query on a PostgresQL runs < 1 ms... Anyway, here's the > > setup: > > > > I have two tables 'records' and 'relations'. > > > > The 'records' table stores a bunch of records along with some metadata. > > It has a unique index 'i' on the 'id' column and a normal index 'm' on > > the 'mtime' column. This table holds about 1.5M rows. > > > > The 'relations' table stores parent/child relations between records and > > has two rows 'parentId' and 'childId'. There is a unique index 'pc' on > > (parentId,childId) and a normal index 'c' on childId. This table holds > > about 35k rows. > > > > The problematic query looks like: > > > > UPDATE records > > SET base='my_base' > > WHERE id IN ( > > SELECT parentId > > FROM relations > > WHERE childId='id_1'); > > > > The right SELECT returns two rows (and in general 0-3 or something like > > that). If I instead of the SELECT hardcode two ids, making the right > > part look like "IN ('id_2', 'id_3')" the query runs in < 1ms. > > > > It would appear that Derby iterates over the entire 'records' table or > > something like that. This also appears to be the case as far as I can > > read from the attached query plan... > > > > Any pointers or ideas on how to tackle this are most appreciated. > > > > Cheers, > > Mikkel > >
