On Fri, Mar 25, 2011 at 5:24 PM, Eric Minbiole <[email protected]> wrote:
> We have an application that has been using SQLite successfully for several
> years. We use SQLite for various purposes within our product, with a few
> dozen tables overall.
>
> Recently, I started to upgrade from version 3.6.3 to 3.7.5. During that
> time, I noticed that several previously fast indexed queries turned into
> slow full table scans. For example, on a simple (hypothetical) example:
>
> CREATE TABLE tbl (
> id INTEGER,
> value INTEGER
> );
> CREATE INDEX idx_id ON tbl (id);
>
> The query "SELECT * FROM tbl WHERE id = 1" previously made use of idx_id.
> However, in 3.7.5, we noticed that it was doing a full table scan, causing
> significant performance issues.
>
> After some debugging, found that the issue is related to our (mis-)use of
> the "analyze" command: Historically, our software runs analyze any time
> the
> db schema changes. However, if our customer had not yet made use of a
> particular feature, the corresponding table(s) might be _empty_ when
> analyze
> is run. On previous versions, this did not cause any obvious problems. In
> the new version, the query planner reasonably assumes that a full table
> scan
> is faster than an index on a small/empty table. However, when the customer
> later makes use of those features (populating the tables), the queries
> become quite slow, as it still does a full table scan.
>
> I'm trying to determine the best way to resolve this issue:
>
Suggestion: After you run ANALYZE, go back and modify values in
sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with
"typical" values that you would expect to see in most databases. Yes, you
can read and write the sqlite_stat1 and sqlite_stat2 tables. Maybe you can
come up with some prepackaged default values for sqlite_stat1 and
sqlite_stat2 based on experiments in your development lab, and then just put
your prepackaged defaults into the real sqlite_stat1 and sqlite_stat2 after
running ANALYZE.
>
> - Re-running analyze after adding data to the table is an obvious
> suggestion. However, as we have lots of tables in use for various
> purposes,
> I'd need to sprinkle lots of "if (first time data added) analyze()" code
> around.
> - I could add a user triggered "maintenance" feature to manually
> re-analyze,
> but that takes away some of the "zero configuration" benefits of our
> product.
> - Even if I stop calling analyze at all going forward, there may be some
> existing (empty) tables that may have already been analyzed by previous
> code.
> - I could make use of the "INDEXED BY" clause, but this goes against the
> documented intent of this feature ("*not* intended for use in tuning the
> performance of a query"), and requires re-writing of many queries.
>
> I'm sure there are other good ideas-- I'm flexible, and open to suggestion.
> I'd appreciate any suggestions the group might have.
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
--
D. Richard Hipp
[email protected]
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users