Hi Rick,
ah, okay, that was the missing piece of puzzle. And as I mentioned PostgreSQL before: It seems that ORs are a general problem. https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/ But it seems there are some techniques or patterns to avoid them. One of them is using an UNION: SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message FROM log l WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) ORDER BY l.create_timestamp DESC, l.id DESC; --> Total: 0.187s SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message FROM log l WHERE ((l.entity_cls = 'Installation') AND (l.entity_id IN ( SELECT i.id FROM installation i WHERE (i.license_id = ?) ))) ORDER BY l.create_timestamp DESC, l.id DESC; --> Total: 0.407s SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message FROM log l WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) OR ((l.entity_cls = 'Installation') AND (l.entity_id IN ( SELECT i.id FROM installation i WHERE (i.license_id = ?) ))) ORDER BY l.create_timestamp DESC, l.id DESC; --> Total: 23.190s SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message FROM log l WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) UNION SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message FROM log l WHERE ((l.entity_cls = 'Installation') AND (l.entity_id IN ( SELECT i.id FROM installation i WHERE (i.license_id = ?) ))) ORDER BY create_timestamp DESC, id DESC; --> Total: 0.624s Not bad, I would say. Maybe one remark: I used also aliases in the ORDER BY clause. But you can't as you work on the result of the UNION which only consists of the columns without their aliases. Thanks a lot, Rick. That helped me a lot. Regards Gerrit -----Ursprüngliche Nachricht----- Von: Rick Hillegas <rick.hille...@gmail.com> Gesendet: Dienstag, 7. Juli 2020 00:48 An: Derby Discussion <derby-user@db.apache.org>; Hohl, Gerrit <g.h...@aurenz.de> Betreff: Re: AW: AW: Questions about query execution and optimization Hi Gerrit, One thing I notice about your big WHERE clause is that it contains an OR. In general, ORs are not optimizable. You will end of with full table scans. Please see http://db.apache.org/derby/docs/10.15/tuning/ctunoptimz39106.html Hope this helps, -Rick On 7/6/20 6:21 AM, Hohl, Gerrit wrote: > Hi Rick, > > > thanks for your message. > > > Thanks also for the link, also I'm not sure I understand its content > correctly. > My understand of indexes so far was that they are used to filter tables when > performing a query. > Let's say we have a table PERSON containing the name and age of persons. We > have an index on the age column. > In our query we want all persons which are 30 years old: SELECT * FROM > person WHERE age = 30; I would expect that the DBMS - after it analysed the > query - will look for all "30" entries in the age index. > Those contain pointers on the original records / tuples in the table file. > Let's say we have 10 matches in the age index, the DBMS will get those 10 > matches from the table file. > Of course, if all columns we select are already in the index - in the best > case we have "SELECT age FROM person WHERE age = 30;" - the query would be > extremely fast. > And it would be slower if additional columns which are not backed by an index > can also be found in the WHERE clause. > > But even set that aside: In my 1st mail I selected the same columns in the > big query as well as in the two smaller queries. > There was no difference between them in this aspect. > The difference was in the WHERE clause - which was more complicated in the > big query and a simpler in the two smaller queries. > But in all versions the same columns were used in the WHERE clause, all of > them backed by indexes. > So it won't also explain the behaviour. > > > About the XPLAIN style things: I guess you're talking about this here. > https://db.apache.org/derby/docs/10.14/tuning/ctun_xplain_style.html > Seems like a neat little project on its own... > > > Regards, > Gerrit > > -----Ursprüngliche Nachricht----- > Von: Rick Hillegas <rick.hille...@gmail.com> > Gesendet: Montag, 6. Juli 2020 14:54 > An: Derby Discussion <derby-user@db.apache.org>; Hohl, Gerrit > <g.h...@aurenz.de> > Betreff: Re: AW: Questions about query execution and optimization > > Hi Gerrit, > > I suspect that your query performs poorly because your indexes do not > cover the query. That means that you are selecting columns which don't > appear in the indexes. In this case, the optimizer knows that Derby > cannot satisfy the query by simply reading index pages. Derby also has > to read base table pages in order to fetch the other columns. Please > see http://db.apache.org/derby/docs/10.15/tuning/ctunoptimz30768.html > > Derby query plans are not easy to read. If you are snapshotting plans with > XPLAIN style, then the details go into snapshot tables. In theory, you could > write a tool to read those snapshot results and format them better. > > Hope this helps, > -Rick > > On 7/6/20 12:46 AM, Hohl, Gerrit wrote: >> Hello everyone, >> >> >> me again. >> In the meantime I tried to tune the table using the >> >> SYSCS_UTIL.SYSCS_COMPRESS_TABLE >> >> and >> >> SYSCS_UTIL.SYSCS_UPDATE_STATISTICS >> >> functions. It didn't have any effect on the query runtime. >> I also tried the query analyse functions: >> >> >> CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1); >> CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1); >> <Query> >> VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS(); >> CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(0); >> CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0); >> >> The result was long, localized (German in my case - nice)... and absolutely >> cryptic. >> It didn't mention any of the tokens of my SQL query. >> Instead it was taking about UNIONs and JOINs, also I don't use any. At least >> not explicitly. >> I couldn't figure at all which entry of the analysis belonged to which part >> of the SQL query. >> >> I remember when I worked with PostgreSQL there was that neat EXPLAIN command. >> That command had a very good structure of its output. >> And it seems they still have it in their current version: >> https://www.postgresql.org/docs/current/using-explain.html >> >> >> Regards, >> Gerrit >> >> -----Ursprüngliche Nachricht----- >> Von: Hohl, Gerrit <g.h...@aurenz.de> >> Gesendet: Freitag, 3. Juli 2020 16:22 >> An: Derby Discussion <derby-user@db.apache.org> >> Betreff: Questions about query execution and optimization >> >> Hello everyone, >> >> this week I came across a behaviour of Apache Derby which I couldn't explain >> to myself. >> I'm using version 10.14.2.0 and the structure of the database looks like >> this: >> >> CREATE TABLE license ( >> id BIGINT NOT NULL, >> [...] >> ); >> CREATE TABLE installation ( >> id BIGINT NOT NULL, >> [...] >> license_id BIGINT NOT NULL, >> [...] >> ); >> CREATE TABLE log ( >> id BIGINT NOT NULL, >> action VARCHAR(255) NOT NULL, >> create_timestamp TIMESTAMP NOT NULL, >> entity_cls VARCHAR(255), >> entity_id BIGINT, >> type INTEGER NOT NULL, >> message VARCHAR(32672) NOT NULL, >> PRIMARY KEY (ID) >> ); >> CREATE INDEX logcreatetimestampindex ON log (create_timestamp); >> CREATE INDEX logentityclsentityidindex ON log (entity_cls, >> entity_id); ALTER TABLE installation ADD CONSTRAINT >> fkinstallationlicense FOREIGN KEY >> (license_id) REFERENCES license (id); >> >> The log table contains log messages about actions on records in other tables. >> Because it should be useable with all tables, there can't be any foreign >> keys. >> Especially as also logs should be kept when the corresponding record is >> deleted. >> >> Now I wanted to create a query which returns all logs of a certain license >> and its installations: >> >> SELECT l.* >> FROM log l >> WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) OR >> ((l.entity_cls = 'Installation) AND (l.entity_id IN ( SELECT i.id >> FROM installation i WHERE (i.license_id = ?) >> ))) >> ORDER BY l.create_timestamp DESC, l.id DESC; >> >> But that thing took forever (~ 12.5s). >> I thought the reason would maybe my index, so I introduced two more: >> >> CREATE INDEX logentityidindex ON log (entity_cls); CREATE INDEX >> logentityclsindex ON log (entity_id); >> >> Unfortunately that didn't change much. The query took almost the exact same >> amount of time. >> Next thing was removing the sub-select and directly giving a list of IDs for >> the installation records - just for testing. >> >> SELECT l.* >> FROM log l >> WHERE ((l.entity_cls = 'License') AND (l.entity_id = 123)) OR >> ((l.entity_cls = 'Installation) AND (l.entity_id IN (234, 345))) >> ORDER BY l.create_timestamp DESC, l.id DESC; >> >> Much to my amazement that also didn't change anything. >> In black despair I split the query into two while keeping the sub-select: >> >> SELECT l.* >> FROM log l >> WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) ORDER BY >> l.create_timestamp DESC, l.id DESC; >> >> SELECT l.* >> FROM log l >> WHERE ((l.entity_cls = 'Installation) AND (l.entity_id IN ( SELECT >> i.id FROM installation.id WHERE (l.license_id = ?) >> ))) >> ORDER BY l.create_timestamp DESC, l.id DESC; >> >> Unbelievable: The 1st query took ~0.15s while the 2nd query took ~0.2s. >> As the sub-select seems not to have any effect how fast or slow the query >> is, that can be ignored. >> In all cases I only have columns in my WHERE clause which are in the indexes. >> But somehow Apache Derby seems not to notice it and not using them, if the >> WHERE clause gets too complicated. >> >> Is there anything I haven't seen? Anything I'm doing wrong by structuring >> the query like I did? >> Somehow I don't get it why the execution time gets 100 times slower just by >> having everything in one query. >> >> Regards, >> Gerrit >>