Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett Fri, 17 Mar 2017 10:31:33 -0700

Thanks to everybody for their help earlier today.

As promised here's the results of our various tests. Hopefully they maybe of use to somebody...

We decided to start from a known position and so recreated the originalindex with the collation in it. We know this was sub optimal but its ourreference point. We have the bytecode output if anybody wants to see it.

CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASEASC);


We ran the the following SQL twice

echo "select * from Disruptions where status = 2 OR status = 6;" |sqlite3 tfl.sqlite > /dev/null


and the two runs totalled 46 mins. Each was actually 23 mins.

We then dropped the old index, built the new one

echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' |sqlite3 tfl.sqlite


We ran

echo "select * from Disruptions where status = 2 OR status = 6;" |sqlite3 tfl.sqlite > /dev/null


twice and each run as 12 mins. So we were twice as quick, which is nice.

We then ran

echo "explain select * from Disruptions where status = 2 UNION ALLselect * from Disruptions where status = 6;" | sqlite3 tfl.sqlite

twice. Each run was around 11.5 mins. We're not going to get intodifferences of less than a minute on a run of this size, so we'll saythey are about the same speed.

Interesting results, clearly the collation does make a big difference.We are now going to go through the schema and check if we have made thesame mistake elsewhere.


Thanks for your help, we can post the bytecode it people are interested.

Rob

On 17 Mar 2017, at 11:41, Rob Willett wrote:

Gunter,

I would never presume to describe anybody as a Nerd!
We're just going back to very first position with the 'bad' collationindex so we can do proper timings as we change things so we understandthe speed up (we hope there is a speed up)
We've written a quick script to check each version. Once we've put theoriginal index back in, we've added a step to generate the SQLitebytecode for you. It's the least we can do...
We'll post this when its completed but we suspect it may take most ofthe day now :)
echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions("status" COLLATE NOCASE ASC);'"echo "explain select * from Disruptions where status = 2 OR status =6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" |sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" |sqlite3 tfl.sqlite > /dev/null
date

echo "-------"

echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' |sqlite3 tfl.sqliteecho "explain select * from Disruptions where status = 2 OR status =6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" |sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" |sqlite3 tfl.sqlite > /dev/null
date

echo "-------"

echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status =6;" | sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 UNION ALL select *from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 UNION ALL select *from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date


On 17 Mar 2017, at 11:30, Hick Gunter wrote:
Nerds with chronic byte code affinity like myself would like to seethe output of "explain" (without "query plan"), i.e. the SQLitebytecode produced. I guess the query with OR will have a subprogramcalled once for each status value, whereas I expect the query withUNION ALL to have 2 copies of the search (which would not affect therun time) and maybe even a temporary table of results (which wouldtake longer and use more memory).
-----Ursprüngliche Nachricht-----
Von: sqlite-users[mailto:[email protected]] Im Auftrag vonRob Willett
Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <[email protected]>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Indexquestion ...
Gunter, Simon,
Thanks for the replies, You both seem to be drilling into thecollation sequence as a possible issue. We now have a new index andwe have just run the query again
sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2OR
status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4(status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>
So we have a different response from the query planner, which I thinkis good.
If we use your other example
sqlite> explain query plan select * from Disruptions where status =2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4(status=?)2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4(status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
I'm not sure which query is going to be faster. We'll have to try andsee.
Your last suggestion of "select * from Disruptions where status =2COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logicallyequivalent to "explain query plan select * from Disruptions wherestatus = 2 OR status = 6;" now we have removed the collation from theindex.
sqlite> explain query plan select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4(status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>
I'll check if we require all the fields, we require many (which Iagree is not all) of the fields. Following this logic through, doesthis mean that it will do more file access bringing the records infrom the file system?
The collation issue seems to be an artifact of the way Navcat forSQLite works. I suspect we need to be more careful about how we usethe tool.
We'll now time the results of each query and run them twice to seethe affect. No idea how long this will take but suspect a few hours:) I will post back the results as other people may (or may not) findthis helpful.
Thanks

Rob

On 17 Mar 2017, at 10:57, Hick Gunter wrote:
On 17 Mar 2017, at 10:20am, Rob Willett
<[email protected]> wrote:
CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
NOCASE ASC);

[…]
As part of the larger more complex query, we are executing thequery
```
select * from Disruptions where status = 2 OR status = 6;
The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.
But your index has a collation order which is usually used fortext.
I don’t see that it is obviously wrong, but it does look a little
weird.

Try creating another index which is just on "status", without the
COLLATE clause.
Then do another ANALYZE, then try the SELECT again.

Simon.
If the index is deemed unsuitable by SQLite due to its collation
sequence, then I expect it qwould also be ignored in "select ...
status=1" (without the second ORed value)
If not, then (select ... where status =2 UNION ALL select wherestatus
= 6) should do the trick

Do you really require all the fields from Disruptions?
And yes, collating integers with NOCASE seems quite strange (thereareno capital or lowercase numbers unless you are using roman numerals;)
); for text affinity, it should render the comparison operators
caseblind, just like "like".


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [email protected]
This communication (including any attachments) is intended for theuseof the intended recipient(s) only and may contain information thatisconfidential, privileged or legally protected. Any unauthorized useor
dissemination of this communication is strictly prohibited. If you
have received this communication in error, please immediately notify
the sender by return e-mail message and delete all copies of the
original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [email protected]
This communication (including any attachments) is intended for theuse of the intended recipient(s) only and may contain informationthat is confidential, privileged or legally protected. Anyunauthorized use or dissemination of this communication is strictlyprohibited. If you have received this communication in error, pleaseimmediately notify the sender by return e-mail message and delete allcopies of the original communication. Thank you for your cooperation.
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

Reply via email to