Thanks to everybody for their help earlier today.

As promised here's the results of our various tests. Hopefully they may be of use to somebody...


We decided to start from a known position and so recreated the original index with the collation in it. We know this was sub optimal but its our reference point. We have the bytecode output if anybody wants to see it.

CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);

We ran the the following SQL twice

echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null

and the two runs totalled 46 mins. Each was actually 23 mins.

We then dropped the old index, built the new one

echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | sqlite3 tfl.sqlite

We ran

echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null

twice and each run as 12 mins. So we were twice as quick, which is nice.

We then ran

echo "explain select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite

twice. Each run was around 11.5 mins. We're not going to get into differences of less than a minute on a run of this size, so we'll say they are about the same speed.

Interesting results, clearly the collation does make a big difference. We are now going to go through the schema and check if we have made the same mistake elsewhere.

Thanks for your help, we can post the bytecode it people are interested.

Rob

On 17 Mar 2017, at 11:41, Rob Willett wrote:

Gunter,

I would never presume to describe anybody as a Nerd!

We're just going back to very first position with the 'bad' collation index so we can do proper timings as we change things so we understand the speed up (we hope there is a speed up)

We've written a quick script to check each version. Once we've put the original index back in, we've added a step to generate the SQLite bytecode for you. It's the least we can do...

We'll post this when its completed but we suspect it may take most of the day now :)

echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);'" echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null
date

echo "-------"

echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' | sqlite3 tfl.sqlite echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null
date

echo "-------"

echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 UNION ALL select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date


On 17 Mar 2017, at 11:30, Hick Gunter wrote:

Nerds with chronic byte code affinity like myself would like to see the output of "explain" (without "query plan"), i.e. the SQLite bytecode produced. I guess the query with OR will have a subprogram called once for each status value, whereas I expect the query with UNION ALL to have 2 copies of the search (which would not affect the run time) and maybe even a temporary table of results (which would take longer and use more memory).

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Rob Willett
Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the collation sequence as a possible issue. We now have a new index and we have just run the query again

sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 OR
status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think is good.

If we use your other example

sqlite> explain query plan select * from Disruptions where status = 2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?) 2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and see.

Your last suggestion of "select * from Disruptions where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically equivalent to "explain query plan select * from Disruptions where status = 2 OR status = 6;" now we have removed the collation from the index.

sqlite> explain query plan select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I agree is not all) of the fields. Following this logic through, does this mean that it will do more file access bringing the records in from the file system?

The collation issue seems to be an artifact of the way Navcat for SQLite works. I suspect we need to be more careful about how we use the tool.

We'll now time the results of each query and run them twice to see the affect. No idea how long this will take but suspect a few hours :) I will post back the results as other people may (or may not) find this helpful.

Thanks

Rob

On 17 Mar 2017, at 10:57, Hick Gunter wrote:

On 17 Mar 2017, at 10:20am, Rob Willett
<rob.sql...@robertwillett.com> wrote:

CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
NOCASE ASC);

[…]

As part of the larger more complex query, we are executing the query

```
select * from Disruptions where status = 2 OR status = 6;

The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.

But your index has a collation order which is usually used for text.
I don’t see that it is obviously wrong, but it does look a little
weird.

Try creating another index which is just on "status", without the
COLLATE clause.
Then do another ANALYZE, then try the SELECT again.

Simon.

If the index is deemed unsuitable by SQLite due to its collation
sequence, then I expect it qwould also be ignored in "select ...
status=1" (without the second ORed value)

If not, then (select ... where status =2 UNION ALL select where status
= 6) should do the trick

Do you really require all the fields from Disruptions?

And yes, collating integers with NOCASE seems quite strange (there are no capital or lowercase numbers unless you are using roman numerals ;)
); for text affinity, it should render the comparison operators
caseblind, just like "like".


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or
dissemination of this communication is strictly prohibited. If you
have received this communication in error, please immediately notify
the sender by return e-mail message and delete all copies of the
original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to