On Wed, 2009-01-14 at 14:15 +0100, [email protected] wrote: > > > -----Original Message----- > > From: Mikkel Kamstrup Erlandsen [mailto:[email protected]] > > Sent: Wednesday, January 14, 2009 12:12 AM > > To: Derby Discussion > > Cc: [email protected] > > Subject: RE: Bad table design leads to bad performance...RE: Bad > > performance with UPDATE on a nested SELECT > [SNIP] > > > > > > As I said above, I just tried out your strategy. Using only integer > > > handles the query runs in about 4s. I still need a factor 100 better > > > than that... > > > > Sorry, forgot to add that this was on a base with ~700k rows only, not > > the full 1,5M rows one... > > > > Cheers, > > Mikkel > > Mikkel, > > Sorry if I'm being a bit slow... > > That 4 seconds was on 700K rows using integers instead of varchars? > Does that also include the optimizer hint or was that without the optimizer > hint?
Yes. And it was without the optimizer hint. > The reason I ask is that if your query was the same and you just switched > from varchar to integers, then there clearly is an issue with varchars and > the optimizer. Yeah, the query was the exact same, just with INTEGER instead of VARCHAR(255) for the id column. For reference, the query we are talking about: UPDATE records SET base='my_base' WHERE id IN ( SELECT parentId FROM relations WHERE childId='id_1'); I am not sure that the issue is about varchars. As I stated in an earlier mail I had a running time of 5-10s on the 700k row base when I was using varchars. This means that an identity index on an integer column gives a small speed up (maybe 30% if I am to pull a random number out of my hat). What I was hoping for was more like 3000%... > If I understood you correctly, it sounded like when you tried the optimizer > hint, using varchars, you got the same result as the integers. Is that the > case? More or less... The integer case might be slightly faster, but I don't have precise timings to back this up. In any case the difference here is very small. Generally I would say that the optimizer hint doesn't provide any real difference. > If you're going to loop through the result set, you may find it faster to > delete the old row and then perform the insert. Updates are hard(er) on a > database. I am just doing a manual loop over the SELECT result set now (it usually contains 0-5 rows) and doing an update for each row. It works at an acceptable speed. < 1ms for each op. > Derby isn't Postgres and there are some neat things you can do in C that you > can't do in Java that could give a C based database an edge. (Pointers can > be your friend. ;-) At the same time, there are design decisions that could > have been made which are hampering performance now. And C doesn't have a JIT compiler :-) Java programs can be blazingly fast. Lucene impresses me on a daily basis :-) I don't believe this issue is about Java. At this point I am quite certain that it is a problem with the query optimizer in Derby. Cheers, Mikkel
