AW: AW: AW: Questions about query execution and optimization

Hohl, Gerrit Tue, 07 Jul 2020 00:03:33 -0700

Hi Rick,


ah, okay, that was the missing piece of puzzle.
And as I mentioned PostgreSQL before: It seems that ORs are a general problem.
https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/

But it seems there are some techniques or patterns to avoid them. One of them 
is using an UNION:

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, 
l.message
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
ORDER BY l.create_timestamp DESC, l.id DESC;

--> Total: 0.187s

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, 
l.message
FROM log l
WHERE ((l.entity_cls = 'Installation') AND (l.entity_id IN (
SELECT i.id FROM installation i WHERE (i.license_id = ?)
)))
ORDER BY l.create_timestamp DESC, l.id DESC;

--> Total: 0.407s

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, 
l.message
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
OR ((l.entity_cls = 'Installation') AND (l.entity_id IN (
SELECT i.id FROM installation i WHERE (i.license_id = ?)
)))
ORDER BY l.create_timestamp DESC, l.id DESC;

--> Total: 23.190s

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, 
l.message
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
UNION
SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, 
l.message
FROM log l
WHERE ((l.entity_cls = 'Installation') AND (l.entity_id IN (
SELECT i.id FROM installation i WHERE (i.license_id = ?)
)))
ORDER BY create_timestamp DESC, id DESC;

--> Total: 0.624s

Not bad, I would say. Maybe one remark: I used also aliases in the ORDER BY 
clause.
But you can't as you work on the result of the UNION which only consists of the 
columns without their aliases.

Thanks a lot, Rick. That helped me a lot.


Regards
Gerrit

-----Ursprüngliche Nachricht-----
Von: Rick Hillegas <rick.hille...@gmail.com> 
Gesendet: Dienstag, 7. Juli 2020 00:48
An: Derby Discussion <derby-user@db.apache.org>; Hohl, Gerrit <g.h...@aurenz.de>
Betreff: Re: AW: AW: Questions about query execution and optimization

Hi Gerrit,

One thing I notice about your big WHERE clause is that it contains an OR. In 
general, ORs are not optimizable. You will end of with full table scans. Please 
see http://db.apache.org/derby/docs/10.15/tuning/ctunoptimz39106.html

Hope this helps,
-Rick

On 7/6/20 6:21 AM, Hohl, Gerrit wrote:
> Hi Rick,
>
>
> thanks for your message.
>
>
> Thanks also for the link, also I'm not sure I understand its content 
> correctly.
> My understand of indexes so far was that they are used to filter tables when 
> performing a query.
> Let's say we have a table PERSON containing the name and age of persons. We 
> have an index on the age column.
> In our query we want all persons which are 30 years old: SELECT * FROM 
> person WHERE age = 30; I would expect that the DBMS - after it analysed the 
> query - will look for all "30" entries in the age index.
> Those contain pointers on the original records / tuples in the table file.
> Let's say we have 10 matches in the age index, the DBMS will get those 10 
> matches from the table file.
> Of course, if all columns we select are already in the index - in the best 
> case we have "SELECT age FROM person WHERE age = 30;" - the query would be 
> extremely fast.
> And it would be slower if additional columns which are not backed by an index 
> can also be found in the WHERE clause.
>
> But even set that aside: In my 1st mail I selected the same columns in the 
> big query as well as in the two smaller queries.
> There was no difference between them in this aspect.
> The difference was in the WHERE clause - which was more complicated in the 
> big query and a simpler in the two smaller queries.
> But in all versions the same columns were used in the WHERE clause, all of 
> them backed by indexes.
> So it won't also explain the behaviour.
>
>
> About the XPLAIN style things: I guess you're talking about this here.
> https://db.apache.org/derby/docs/10.14/tuning/ctun_xplain_style.html
> Seems like a neat little project on its own...
>
>
> Regards,
> Gerrit
>
> -----Ursprüngliche Nachricht-----
> Von: Rick Hillegas <rick.hille...@gmail.com>
> Gesendet: Montag, 6. Juli 2020 14:54
> An: Derby Discussion <derby-user@db.apache.org>; Hohl, Gerrit 
> <g.h...@aurenz.de>
> Betreff: Re: AW: Questions about query execution and optimization
>
> Hi Gerrit,
>
> I suspect that your query performs poorly because your indexes do not 
> cover the query. That means that you are selecting columns which don't 
> appear in the indexes. In this case, the optimizer knows that Derby 
> cannot satisfy the query by simply reading index pages. Derby also has 
> to read base table pages in order to fetch the other columns. Please 
> see http://db.apache.org/derby/docs/10.15/tuning/ctunoptimz30768.html
>
> Derby query plans are not easy to read. If you are snapshotting plans with 
> XPLAIN style, then the details go into snapshot tables. In theory, you could 
> write a tool to read those snapshot results and format them better.
>
> Hope this helps,
> -Rick
>
> On 7/6/20 12:46 AM, Hohl, Gerrit wrote:
>> Hello everyone,
>>
>>
>> me again.
>> In the meantime I tried to tune the table using the
>>
>>      SYSCS_UTIL.SYSCS_COMPRESS_TABLE
>>
>> and
>>
>>      SYSCS_UTIL.SYSCS_UPDATE_STATISTICS
>>
>> functions. It didn't have any effect on the query runtime.
>> I also tried the query analyse functions:
>>
>>
>>      CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
>>      CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1);
>>      <Query>
>>      VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
>>      CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(0);
>>      CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0);
>>
>> The result was long, localized (German in my case - nice)... and absolutely 
>> cryptic.
>> It didn't mention any of the tokens of my SQL query.
>> Instead it was taking about UNIONs and JOINs, also I don't use any. At least 
>> not explicitly.
>> I couldn't figure at all which entry of the analysis belonged to which part 
>> of the SQL query.
>>
>> I remember when I worked with PostgreSQL there was that neat EXPLAIN command.
>> That command had a very good structure of its output.
>> And it seems they still have it in their current version:
>> https://www.postgresql.org/docs/current/using-explain.html
>>
>>
>> Regards,
>> Gerrit
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Hohl, Gerrit <g.h...@aurenz.de>
>> Gesendet: Freitag, 3. Juli 2020 16:22
>> An: Derby Discussion <derby-user@db.apache.org>
>> Betreff: Questions about query execution and optimization
>>
>> Hello everyone,
>>
>> this week I came across a behaviour of Apache Derby which I couldn't explain 
>> to myself.
>> I'm using version 10.14.2.0 and the structure of the database looks like 
>> this:
>>
>> CREATE TABLE license (
>>      id BIGINT NOT NULL,
>>      [...]
>> );
>> CREATE TABLE installation (
>>      id BIGINT NOT NULL,
>>      [...]
>>      license_id BIGINT NOT NULL,
>>      [...]
>> );
>> CREATE TABLE log (
>>      id BIGINT NOT NULL,
>>      action VARCHAR(255) NOT NULL,
>>      create_timestamp TIMESTAMP NOT NULL,
>>      entity_cls VARCHAR(255),
>>      entity_id BIGINT,
>>      type INTEGER NOT NULL,
>>      message VARCHAR(32672) NOT NULL,
>>      PRIMARY KEY (ID)
>> );
>> CREATE INDEX logcreatetimestampindex ON log (create_timestamp); 
>> CREATE INDEX logentityclsentityidindex ON log (entity_cls, 
>> entity_id); ALTER TABLE installation ADD CONSTRAINT 
>> fkinstallationlicense FOREIGN KEY
>> (license_id) REFERENCES license (id);
>>
>> The log table contains log messages about actions on records in other tables.
>> Because it should be useable with all tables, there can't be any foreign 
>> keys.
>> Especially as also logs should be kept when the corresponding record is 
>> deleted.
>>
>> Now I wanted to create a query which returns all logs of a certain license 
>> and its installations:
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) OR 
>> ((l.entity_cls = 'Installation) AND (l.entity_id IN ( SELECT i.id 
>> FROM installation i WHERE (i.license_id = ?)
>> )))
>> ORDER BY l.create_timestamp DESC, l.id DESC;
>>
>> But that thing took forever (~ 12.5s).
>> I thought the reason would maybe my index, so I introduced two more:
>>
>> CREATE INDEX logentityidindex ON log (entity_cls); CREATE INDEX 
>> logentityclsindex ON log (entity_id);
>>
>> Unfortunately that didn't change much. The query took almost the exact same 
>> amount of time.
>> Next thing was removing the sub-select and directly giving a list of IDs for 
>> the installation records - just for testing.
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'License') AND (l.entity_id = 123)) OR 
>> ((l.entity_cls = 'Installation) AND (l.entity_id IN (234, 345))) 
>> ORDER BY l.create_timestamp DESC, l.id DESC;
>>
>> Much to my amazement that also didn't change anything.
>> In black despair I split the query into two while keeping the sub-select:
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) ORDER BY 
>> l.create_timestamp DESC, l.id DESC;
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'Installation) AND (l.entity_id IN ( SELECT 
>> i.id FROM installation.id WHERE (l.license_id = ?)
>> )))
>> ORDER BY l.create_timestamp DESC, l.id DESC;
>>
>> Unbelievable: The 1st query took ~0.15s while the 2nd query took ~0.2s.
>> As the sub-select seems not to have any effect how fast or slow the query 
>> is, that can be ignored.
>> In all cases I only have columns in my WHERE clause which are in the indexes.
>> But somehow Apache Derby seems not to notice it and not using them, if the 
>> WHERE clause gets too complicated.
>>
>> Is there anything I haven't seen? Anything I'm doing wrong by structuring 
>> the query like I did?
>> Somehow I don't get it why the execution time gets 100 times slower just by 
>> having everything in one query.
>>
>> Regards,
>> Gerrit
>>

AW: AW: AW: Questions about query execution and optimization

Reply via email to