Thanks to everybody for their help earlier today.
As promised here's the results of our various tests. Hopefully they may
be of use to somebody...
We decided to start from a known position and so recreated the original
index with the collation in it. We know this was sub optimal but its our
reference point. We have the bytecode output if anybody wants to see it.
CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE
ASC);
We ran the the following SQL twice
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
and the two runs totalled 46 mins. Each was actually 23 mins.
We then dropped the old index, built the new one
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' |
sqlite3 tfl.sqlite
We ran
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
twice and each run as 12 mins. So we were twice as quick, which is nice.
We then ran
echo "explain select * from Disruptions where status = 2 UNION ALL
select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite
twice. Each run was around 11.5 mins. We're not going to get into
differences of less than a minute on a run of this size, so we'll say
they are about the same speed.
Interesting results, clearly the collation does make a big difference.
We are now going to go through the schema and check if we have made the
same mistake elsewhere.
Thanks for your help, we can post the bytecode it people are interested.
Rob
On 17 Mar 2017, at 11:41, Rob Willett wrote:
Gunter,
I would never presume to describe anybody as a Nerd!
We're just going back to very first position with the 'bad' collation
index so we can do proper timings as we change things so we understand
the speed up (we hope there is a speed up)
We've written a quick script to check each version. Once we've put the
original index back in, we've added a step to generate the SQLite
bytecode for you. It's the least we can do...
We'll post this when its completed but we suspect it may take most of
the day now :)
echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions
("status" COLLATE NOCASE ASC);'"
echo "explain select * from Disruptions where status = 2 OR status =
6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date
echo "-------"
echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' |
sqlite3 tfl.sqlite
echo "explain select * from Disruptions where status = 2 OR status =
6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date
echo "-------"
echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status =
6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 UNION ALL select *
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 UNION ALL select *
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date
On 17 Mar 2017, at 11:30, Hick Gunter wrote:
Nerds with chronic byte code affinity like myself would like to see
the output of "explain" (without "query plan"), i.e. the SQLite
bytecode produced. I guess the query with OR will have a subprogram
called once for each status value, whereas I expect the query with
UNION ALL to have 2 copies of the search (which would not affect the
run time) and maybe even a temporary table of results (which would
take longer and use more memory).
-----Ursprüngliche Nachricht-----
Von: sqlite-users
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von
Rob Willett
Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index
question ...
Gunter, Simon,
Thanks for the replies, You both seem to be drilling into the
collation sequence as a possible issue. We now have a new index and
we have just run the query again
sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2
OR
status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
(status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>
So we have a different response from the query planner, which I think
is good.
If we use your other example
sqlite> explain query plan select * from Disruptions where status =
2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
(status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
(status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
I'm not sure which query is going to be faster. We'll have to try and
see.
Your last suggestion of "select * from Disruptions where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically
equivalent to "explain query plan select * from Disruptions where
status = 2 OR status = 6;" now we have removed the collation from the
index.
sqlite> explain query plan select * from Disruptions where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
(status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>
I'll check if we require all the fields, we require many (which I
agree is not all) of the fields. Following this logic through, does
this mean that it will do more file access bringing the records in
from the file system?
The collation issue seems to be an artifact of the way Navcat for
SQLite works. I suspect we need to be more careful about how we use
the tool.
We'll now time the results of each query and run them twice to see
the affect. No idea how long this will take but suspect a few hours
:) I will post back the results as other people may (or may not) find
this helpful.
Thanks
Rob
On 17 Mar 2017, at 10:57, Hick Gunter wrote:
On 17 Mar 2017, at 10:20am, Rob Willett
<rob.sql...@robertwillett.com> wrote:
CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
NOCASE ASC);
[…]
As part of the larger more complex query, we are executing the
query
```
select * from Disruptions where status = 2 OR status = 6;
The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.
But your index has a collation order which is usually used for
text.
I don’t see that it is obviously wrong, but it does look a little
weird.
Try creating another index which is just on "status", without the
COLLATE clause.
Then do another ANALYZE, then try the SELECT again.
Simon.
If the index is deemed unsuitable by SQLite due to its collation
sequence, then I expect it qwould also be ignored in "select ...
status=1" (without the second ORed value)
If not, then (select ... where status =2 UNION ALL select where
status
= 6) should do the trick
Do you really require all the fields from Disruptions?
And yes, collating integers with NOCASE seems quite strange (there
are
no capital or lowercase numbers unless you are using roman numerals
;)
); for text affinity, it should render the comparison operators
caseblind, just like "like".
___________________________________________
Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at
This communication (including any attachments) is intended for the
use
of the intended recipient(s) only and may contain information that
is
confidential, privileged or legally protected. Any unauthorized use
or
dissemination of this communication is strictly prohibited. If you
have received this communication in error, please immediately notify
the sender by return e-mail message and delete all copies of the
original communication. Thank you for your cooperation.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___________________________________________
Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at
This communication (including any attachments) is intended for the
use of the intended recipient(s) only and may contain information
that is confidential, privileged or legally protected. Any
unauthorized use or dissemination of this communication is strictly
prohibited. If you have received this communication in error, please
immediately notify the sender by return e-mail message and delete all
copies of the original communication. Thank you for your cooperation.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users