Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-21 Thread RSmith


On 2014/09/21 15:39, Merike wrote:
Now I could very well be wrong about that as you say in your other reply that "It might simply be that Analyze did not get your QP 
to react on that size DB as it did for us". You seem to be saying that analyze behaves differently depending on database size...


This is very much what I'm saying but maybe not exactly like you think. ANALYZE really interprets the "Shape" of your data (average 
width, cardinality and so on) and from that provides figures to the Query Planner by which to make decisions. For this reason, 
running ANALYZE on any amount of databases, different sizes, different kinds of data, will all provide different figures to the 
Query planner and it is very possible to produce a set of figures running analyze on one DB which changes the QP's mind on what plan 
to follow, whereas the very same Schema DB but with other data in it might make the QP choose a different plan - either of which may 
or may not hide the bug in question.


Glad to hear the trunk fixed your problem though - that will be standard in the 
next release.

Cheers!
Ryan



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-21 Thread Merike
21.09.2014 00:08, Richard Hipp kirjutas:
> On Sat, Sep 20, 2014 at 12:45 PM, Merike  wrote:
>
>> 19.09.2014 04:21, Richard Hipp kirjutas:
>>> A simple script to reproduce the problem in the latest SQLite is as
>>> follows: CREATE TABLE t1(a INTEGER PRIMARY KEY, b INTEGER, c INTEGER,
>>> d INTEGER); CREATE INDEX t1b ON t1(b); CREATE TABLE t2(x INTEGER
>>> PRIMARY KEY, y); explain query plan SELECT * FROM t1, t2 WHERE x=c AND
>>> b=?1 ORDER BY a; .print -- CREATE INDEX t1bd ON
>>> t1(b,d); explain query plan SELECT * FROM t1, t2 WHERE x=c AND b=?1
>>> ORDER BY a; If you copy/paste the above script into an sqlite3
>>> command-line shell, you'll see that, for some reason, the query
>>> planner decides to sort manually rather than using an index for
>>> sorting after the t1bd index is added. We are still investigating to
>>> try understand why that is. Please note that if you run ANALYZE on
>>> your sample database, the query is fast again. On my (Ubuntu) desktop,
>>> the query takes 648 milliseconds before being analyzed and 600
>>> microseconds afterwards - a 1000-fold speedup.
>> A question: is the query being fast again after analyze call indicative
>> of the bug being fixed? Because I tried it on my original database too
>> and there I don't see a speedup after analyze. Should I try to minimize
>> it to a smaller database again where the bug still occurs, even after
>> analyze? Or will the change you made fix my original database speed as
>> well despite the analyze call not helping it?
>>
> The change fixes the problem (for us) *without* requiring ANALYZE.
>
Right, that's not what I was trying to ask so I'll try to explain again.

Is there a way to tell if this fix helps the query speed only on the
example database I provided or also the original one without waiting for
a new release of sqlite? Since you said that analyze helps the example
database even without code fix I tried running analyze on original
database too, hoping it might speed up that as well. And there it did
not help. This might mean nothing, I don't know sqlite well enough to
tell, but could it also mean that the fix (when released) might not
speed up the same query on my original database? The table structure is
the same there but data is different and there's more of it. It could
very well be that waiting for the release is the only way to find it out
as I'm not interested enough to try to build sqlite myself but I was
trying to figure out if there's another way :) Otherwise there could be
a chance that next release comes around and might not help the query on
actual database and then I would need to post here again and wait for
another release.

Merike
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-21 Thread Merike
21.09.2014 15:12, RSmith kirjutas:
> Merike: Running Analyze did not fix the bug, it simply changed some
> internal-use values that allowed the bug to be circumvented. If you
> re-make a table as you had without running analyze, the problem will
> surely remain using the same codebase. Updating your SQLite code from
> the Trunk specified by Richard will fix the bug for real, meaning that
> analyze would make no real difference (for the specifed use case).
> Naturally the trunk is also included in the next release which is
> probably due within the next month at which time the bug fix will be
> standard across all released versions.
So, as I understand it analyze allows the bug to be circumvented on
example database without upgrading sqlite itself. What I'm seeing with
original database is that analyze does not allow circumvention of the
bug there. Which could mean that the bug fixed is not the only one
affecting original database even if it's the only one affecting example
one. If both of the databases were affected by the exact same bug I
would expect that analyze would allow circumvention in both cases, not
just with example database.

Now I could very well be wrong about that as you say in your other reply
that "It might simply be that Analyze did not get your QP to react on
that size DB as it did for us". You seem to be saying that analyze
behaves differently depending on database size. This would explain why
analyze didn't allow circumvention on original database. In terms of
size "backlog" table has 160 472 rows in the example I provided while
originally it has ~1 100 000 rows.

As I discovered it's much simpler to build sqlite than I expected I just
checked original database with new version. And the query was fast there
so the fix actually works on original database as well :)

Merike
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-21 Thread RSmith


On 2014/09/21 14:12, RSmith wrote:


On 2014/09/20 23:08, Richard Hipp wrote:

On Sat, Sep 20, 2014 at 12:45 PM, Merike  wrote:


A question: is the query being fast again after analyze call indicative
of the bug being fixed? Because I tried it on my original database too
and there I don't see a speedup after analyze. Should I try to minimize
it to a smaller database again where the bug still occurs, even after
analyze? Or will the change you made fix my original database speed as
well despite the analyze call not helping it?


The change fixes the problem (for us) *without* requiring ANALYZE.



Richard, I think the OP meant to ask whether or not running analyze fixed the bug, maybe a slight misunderstanding and you 
probably assumed correct understanding as a benefit of the doubt scenario - so allow me to just add the following in case the OP 
did think of it the other way...


On second thought, I think the first interpretation was correct. Merike - just so I understand clearly - you DID download and is 
using the new trunk yes? If so, how big is the old DB you tried it on? Could you try the minimize and send the file via a 
file-upload service?
It might simply be that Analyze did not get your QP to react on that size DB as it did for us - but if the bug is not fixed it needs 
to be seen.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-21 Thread RSmith


On 2014/09/20 23:08, Richard Hipp wrote:

On Sat, Sep 20, 2014 at 12:45 PM, Merike  wrote:


A question: is the query being fast again after analyze call indicative
of the bug being fixed? Because I tried it on my original database too
and there I don't see a speedup after analyze. Should I try to minimize
it to a smaller database again where the bug still occurs, even after
analyze? Or will the change you made fix my original database speed as
well despite the analyze call not helping it?


The change fixes the problem (for us) *without* requiring ANALYZE.



Richard, I think the OP meant to ask whether or not running analyze fixed the bug, maybe a slight misunderstanding and you probably 
assumed correct understanding as a benefit of the doubt scenario - so allow me to just add the following in case the OP did think of 
it the other way:


Merike: Running Analyze did not fix the bug, it simply changed some internal-use values that allowed the bug to be circumvented. If 
you re-make a table as you had without running analyze, the problem will surely remain using the same codebase. Updating your SQLite 
code from the Trunk specified by Richard will fix the bug for real, meaning that analyze would make no real difference (for the 
specifed use case). Naturally the trunk is also included in the next release which is probably due within the next month at which 
time the bug fix will be standard across all released versions.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-20 Thread Richard Hipp
On Sat, Sep 20, 2014 at 12:45 PM, Merike  wrote:

> 19.09.2014 04:21, Richard Hipp kirjutas:
> > A simple script to reproduce the problem in the latest SQLite is as
> > follows: CREATE TABLE t1(a INTEGER PRIMARY KEY, b INTEGER, c INTEGER,
> > d INTEGER); CREATE INDEX t1b ON t1(b); CREATE TABLE t2(x INTEGER
> > PRIMARY KEY, y); explain query plan SELECT * FROM t1, t2 WHERE x=c AND
> > b=?1 ORDER BY a; .print -- CREATE INDEX t1bd ON
> > t1(b,d); explain query plan SELECT * FROM t1, t2 WHERE x=c AND b=?1
> > ORDER BY a; If you copy/paste the above script into an sqlite3
> > command-line shell, you'll see that, for some reason, the query
> > planner decides to sort manually rather than using an index for
> > sorting after the t1bd index is added. We are still investigating to
> > try understand why that is. Please note that if you run ANALYZE on
> > your sample database, the query is fast again. On my (Ubuntu) desktop,
> > the query takes 648 milliseconds before being analyzed and 600
> > microseconds afterwards - a 1000-fold speedup.
>
> A question: is the query being fast again after analyze call indicative
> of the bug being fixed? Because I tried it on my original database too
> and there I don't see a speedup after analyze. Should I try to minimize
> it to a smaller database again where the bug still occurs, even after
> analyze? Or will the change you made fix my original database speed as
> well despite the analyze call not helping it?
>

The change fixes the problem (for us) *without* requiring ANALYZE.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-20 Thread Merike
19.09.2014 04:21, Richard Hipp kirjutas:
> A simple script to reproduce the problem in the latest SQLite is as
> follows: CREATE TABLE t1(a INTEGER PRIMARY KEY, b INTEGER, c INTEGER,
> d INTEGER); CREATE INDEX t1b ON t1(b); CREATE TABLE t2(x INTEGER
> PRIMARY KEY, y); explain query plan SELECT * FROM t1, t2 WHERE x=c AND
> b=?1 ORDER BY a; .print -- CREATE INDEX t1bd ON
> t1(b,d); explain query plan SELECT * FROM t1, t2 WHERE x=c AND b=?1
> ORDER BY a; If you copy/paste the above script into an sqlite3
> command-line shell, you'll see that, for some reason, the query
> planner decides to sort manually rather than using an index for
> sorting after the t1bd index is added. We are still investigating to
> try understand why that is. Please note that if you run ANALYZE on
> your sample database, the query is fast again. On my (Ubuntu) desktop,
> the query takes 648 milliseconds before being analyzed and 600
> microseconds afterwards - a 1000-fold speedup.

A question: is the query being fast again after analyze call indicative
of the bug being fixed? Because I tried it on my original database too
and there I don't see a speedup after analyze. Should I try to minimize
it to a smaller database again where the bug still occurs, even after
analyze? Or will the change you made fix my original database speed as
well despite the analyze call not helping it?

Merike
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-19 Thread Merike
19.09.2014 05:24, Richard Hipp kirjutas:
> Thanks again for the regression report.  This problem is now fixed on
> the SQLite trunk.  See
> http://www.sqlite.org/src/info/72727b68cd0796?dc=22 for the check-in
> that fixes the problem.
>
> If all you want to know is that the problem has been fixed, you may
> stop reading now.  If you are curious about the cause of the problem
> was, you may continue reading.
>
>
That's great to hear! Looking forward to next release :)

Merike
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-18 Thread Richard Hipp
On Thu, Sep 18, 2014 at 8:54 AM, Merike  wrote:

> Hi everyone,
>
> Since upgrading to Kubuntu 14.04 I've had an issue with Quassel irc
> client being slow on startup when it retrieves backlog from database.
> I've tracked it down to different sqlite version being installed.
> Previously I had 3.7.17 and now have 3.8.2. I've tried various versions
> from https://launchpad.net/ubuntu/trusty/+source/sqlite3/+builds and can
> only pinpoint it to between 3.7.17 and 3.8.0.2 because there doesn't
> appear to be intermediate builds.
>

Thanks again for the regression report.  This problem is now fixed on the
SQLite trunk.  See
http://www.sqlite.org/src/info/72727b68cd0796?dc=22 for the check-in that
fixes the problem.

If all you want to know is that the problem has been fixed, you may stop
reading now.  If you are curious about the cause of the problem was, you
may continue reading.

Here is a simplified test case that illustrates the problem:

CREATE TABLE t1(x INTEGER PRIMARY KEY, a, b);
CREATE INDEX t1a ON t1(a);
CREATE INDEX t1ab ON t1(a,b);
explain query plan
SELECT * FROM t1 WHERE a=?1 ORDER BY x;

The query could use either index, t1a or t1ab, to lookup the appropriate
rows based on "WHERE a=?1".  If index t1a is used for the lookup, then the
rows will come out ordered by x, since every index contains the primary key
after the indexed columns.  It is as if the t1a index was really over two
columns, a and x, and the t1ab index was over three columns, a, b, and x.
So when t1a is used for lookup, SQLite searches for the first entry in the
index where a=?1 then it starts reading successive entries, as long as the
a=?1 condition holds, to find the primary key and hence to look up the
table content.  But notice how doing that extracts the rows in ascending
"x" order - exactly as requested by the ORDER BY clause.  SQLite recognizes
this and suppresses the sort operation since the rows are going to come out
in the correct order naturally.

But if t1ab is used to lookup the rows, there is that pesky 'b" column in
between the "a" which is constrained by "WHERE a=?" and the "x" column.
There might be two or more different "b" values for the same "a", and hence
the "x" values will not necessarily be in the right order.  So the output
must be sorted by a separate post-processing step.  That is much slower.

Normally, the query planner looks at all possible indexes and tries each
one to see which one will generate a plan with the least amount of work.
In that case, t1a would be selected over t1ab since t1a can omit the sort
operation at the end, which is usually one of the most expensive parts of a
query.  Unfortunately, the query planner (incorrectly) rejected the t1a
index in favor of t1ab very early in the query planning process.  There is
a prefilter that asks whether or not a index might give output in the ORDER
BY order naturally.  Such indexes are given special protection from early
rejection so that they can be thoroughly analyzed.  But in this case,
because the ORDER BY was on the INTEGER PRIMARY KEY rather than on a key
column, that prefilter failed to recognize that t1a might naturally render
the output in the desired order.  Hence, t1a ended up being rejected in
favor of t1ab, which is a covering index and hence involves less lookup
work.

The one-line change shown above fixes the prefilter so that it recognizes
that t1a might be useful in implementing the ORDER BY clause, thus giving
t1a protection from being overridden by t1ab.  Both t1a and t1ab indexes
have to compete in the final selection process which t1a clearly wins
because the satisfies the ORDER BY clause without having to do a separate
sort.




>
> I don't know what exact queries Quassel runs but most likely (after
> looking at some Quassel source on github) it's something like the
> following for a single channel:
> SELECT messageid, time, type, flags, sender, message
> FROM backlog
> JOIN sender ON backlog.senderid = sender.senderid
> WHERE bufferid = 102
> ORDER BY messageid DESC
> LIMIT 100;
>
> When I run this query on 3.7.17 it returns nearly instantly. When I do
> it on 3.8.0.2 or 3.8.6 which I also tried then it takes about 4 seconds
> on my not so fast spinning drive laptop. I've posted example database
> (minimized from original) at
> https://docs.google.com/file/d/0Bzx3gCDqfzVdcDNhdzlfVlh4ZTA/. Could
> someone either confirm or prove false that this query has become slower
> with newer versions?
>
> Merike
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression between 3.7.17 and 3.8.0.2?

2014-09-18 Thread Richard Hipp
On Thu, Sep 18, 2014 at 8:54 AM, Merike  wrote:

> Hi everyone,
>
> Since upgrading to Kubuntu 14.04 I've had an issue with Quassel irc
> client being slow on startup when it retrieves backlog from database.
> I've tracked it down to different sqlite version being installed.
> Previously I had 3.7.17 and now have 3.8.2. I've tried various versions
> from https://launchpad.net/ubuntu/trusty/+source/sqlite3/+builds and can
> only pinpoint it to between 3.7.17 and 3.8.0.2 because there doesn't
> appear to be intermediate builds.
>

Thanks for the performance regression report.

A simple script to reproduce the problem in the latest SQLite is as follows:

CREATE TABLE t1(a INTEGER PRIMARY KEY, b INTEGER, c INTEGER, d INTEGER);
CREATE INDEX t1b ON t1(b);
CREATE TABLE t2(x INTEGER PRIMARY KEY, y);
explain query plan
SELECT * FROM t1, t2 WHERE x=c AND b=?1 ORDER BY a;
.print --
CREATE INDEX t1bd ON t1(b,d);
explain query plan
SELECT * FROM t1, t2 WHERE x=c AND b=?1 ORDER BY a;


If you copy/paste the above script into an sqlite3 command-line shell,
you'll see that, for some reason, the query planner decides to sort
manually rather than using an index for sorting after the t1bd index is
added. We are still investigating to try understand why that is.

Please note that if you run ANALYZE on your sample database, the query is
fast again.  On my (Ubuntu) desktop, the query takes 648 milliseconds
before being analyzed and 600 microseconds afterwards - a 1000-fold speedup.



>
> I don't know what exact queries Quassel runs but most likely (after
> looking at some Quassel source on github) it's something like the
> following for a single channel:
> SELECT messageid, time, type, flags, sender, message
> FROM backlog
> JOIN sender ON backlog.senderid = sender.senderid
> WHERE bufferid = 102
> ORDER BY messageid DESC
> LIMIT 100;
>
> When I run this query on 3.7.17 it returns nearly instantly. When I do
> it on 3.8.0.2 or 3.8.6 which I also tried then it takes about 4 seconds
> on my not so fast spinning drive laptop. I've posted example database
> (minimized from original) at
> https://docs.google.com/file/d/0Bzx3gCDqfzVdcDNhdzlfVlh4ZTA/. Could
> someone either confirm or prove false that this query has become slower
> with newer versions?
>
> Merike
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users