Yesterday or IRC, Bert, Philip and I were chatting about our SQLite perf issues and how Philip's findings in the past suggested that SQLite wasn't using its indices to optimize our queries.
After searching and discussing its documentation, Philip suggested the -too obvious- "maybe we have the wrong indices". So, I went to work with his "fake database generator script" (attached as "test.py"). The type of query we're seeing problematic performance with looks like the one below. The essential part is the WHERE clause. SELECT * FROM nodes WHERE wc_id = 1 AND (local_relpath = 'foo' OR local_relpath like 'foo%'); We discussed 3 ways to achieve the effect of this query: 1. The query itself 2. The query stated as a UNION of two queries 3. Running the two parts of the UNION manually ourselves. Ad (1) This query doesn't perform as we had hoped to get from using a database. Ad (2) In the past, UNIONs have been explicitly removed because they were creating temporary tables (on disk!). However, since then we have changed our SQLite setup to create temporary tables in memory, so the option should really be re-evaluated. Ad (3) I'd hate to have to use two queries in all places in our source where we want to run queries like these. As a result, I think this scenario should be avoided if we can. So, I've created 'perf.py' to evaluate each of these scenarios, researching the effect on each of them under the influence of adding different indices. This is my finding: Scenario (1) [an AND combined with a complex OR] doesn't perform well under any circumstance. Scenario (2) performs differently, depending on the available indices. Scenario (3) performs roughly equal to scenario (2). Scenario (2) takes ~0.27 seconds to evaluate in the unmodified database. Adding an index on (wc_id, local_relpath) makes the execution time drop to ~0.000156 seconds! Seems Philip was right :-) We need to carefully review the indices we have in our database to support good performance. Bye, Erik.