Re: [sqlite] Why is this so much more efficient?

2017-02-16 Thread Barry Smith
Aliases in SQL are not the same as variables in most procedural languages. So 
every time you mention 'totaltime' SQLite is probably recalculating that value 
by adding all the columns together.  See the various discussions regarding no 
deterministic (random) functions last year. Less references to that alias => 
less calculations. Reference it twice and expect it to be slower.

> On 17 Feb 2017, at 12:27 AM, Darko Volaric  wrote:
> 
> You can actually index functions or expression:
> https://www.sqlite.org/expridx.html
> 
> On Thu, Feb 16, 2017 at 9:32 PM, Cecil Westerhof 
> wrote:
> 
>> 2017-02-16 21:10 GMT+01:00 Dominique Pellé :
>> 
>>> Cecil Westerhof  wrote:
>>> 
 I have a table vmstat that I use to store vmstat info. ;-)
 At the moment it has more as 661 thousand records.
 
 In principle the values of usertime, systemtime, idletime, waittime and
 stolentime should add up to 100. I just wanted to check it. Of-course
>>> there
 could be a rounding error, so I wrote the following query:
 SELECT date
 ,  time
 ,  usertime
 ,  systemtime
 ,  idletime
 ,  waittime
 ,  stolentime
 ,  (usertime + systemtime + idletime + waittime + stolentime) AS
 totaltime
 FROM   vmstat
 WHERE  totaltime  < 99 OR totaltime > 101
 
 I did not like that, so I rewrote the WHERE to:
 WHERE  ABS(100 - totaltime) > 1
 
 The funny thing the second WHERE is more efficient as the first, where
>> I
 would have expected it to be the other way around.
 The first takes around 1.050 milliseconds.
 The second takes around  950 milliseconds.
 So the second is around 10% more efficient. Why is this?
 
 In case it is important: I did this in sqlitebrowser 3.7.0, which uses
 SQLite 3.8.10.2.
>>> 
>>> 
>>> I suspect  that when you use "WHERE ABS(100 - totaltime) > 1"
>>> SQLite cannot use an index since it's an expression, so it does a
>>> full table scan, whereas the other solution which does
>>> "WHERE  totaltime  < 99 OR totaltime > 101"
>>> may use an index on totaltime (assuming that there is an index).
>>> 
>>> In general using an index is good.  But if most of the records
>>> satisfy the condition "ABS(100 - totaltime) > 1" then an index
>>> can be more harmful than useful.   And that could explain
>>> why using "WHERE ABS(100 - totaltime) > 1" is a bit faster.
>>> You could try "EXPLAIN QUERY PLAN" on your queries to
>>> see if they use an index or if they do a full table scan.
>> 
>> ​Totaltime is calculated, so it cannot have an index. ;-)
>> Besides from the almost 700.000 records only two satisfy the condition.
>> 
>> I should look into EXPLAIN QUERY PLAN.
>> 
>> --
>> Cecil Westerhof
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this so much more efficient?

2017-02-16 Thread Darko Volaric
You can actually index functions or expression:
https://www.sqlite.org/expridx.html

On Thu, Feb 16, 2017 at 9:32 PM, Cecil Westerhof 
wrote:

> 2017-02-16 21:10 GMT+01:00 Dominique Pellé :
>
> > Cecil Westerhof  wrote:
> >
> > > I have a table vmstat that I use to store vmstat info. ;-)
> > > At the moment it has more as 661 thousand records.
> > >
> > > In principle the values of usertime, systemtime, idletime, waittime and
> > > stolentime should add up to 100. I just wanted to check it. Of-course
> > there
> > > could be a rounding error, so I wrote the following query:
> > > SELECT date
> > > ,  time
> > > ,  usertime
> > > ,  systemtime
> > > ,  idletime
> > > ,  waittime
> > > ,  stolentime
> > > ,  (usertime + systemtime + idletime + waittime + stolentime) AS
> > > totaltime
> > > FROM   vmstat
> > > WHERE  totaltime  < 99 OR totaltime > 101
> > >
> > > I did not like that, so I rewrote the WHERE to:
> > > WHERE  ABS(100 - totaltime) > 1
> > >
> > > The funny thing the second WHERE is more efficient as the first, where
> I
> > > would have expected it to be the other way around.
> > > The first takes around 1.050 milliseconds.
> > > The second takes around  950 milliseconds.
> > > So the second is around 10% more efficient. Why is this?
> > >
> > > In case it is important: I did this in sqlitebrowser 3.7.0, which uses
> > > SQLite 3.8.10.2.
> >
> >
> > I suspect  that when you use "WHERE ABS(100 - totaltime) > 1"
> > SQLite cannot use an index since it's an expression, so it does a
> > full table scan, whereas the other solution which does
> > "WHERE  totaltime  < 99 OR totaltime > 101"
> > may use an index on totaltime (assuming that there is an index).
> >
> > In general using an index is good.  But if most of the records
> > satisfy the condition "ABS(100 - totaltime) > 1" then an index
> > can be more harmful than useful.   And that could explain
> > why using "WHERE ABS(100 - totaltime) > 1" is a bit faster.
> > You could try "EXPLAIN QUERY PLAN" on your queries to
> > see if they use an index or if they do a full table scan.
> >
>
> ​Totaltime is calculated, so it cannot have an index. ;-)
> Besides from the almost 700.000 records only two satisfy the condition.
>
> I should look into EXPLAIN QUERY PLAN.
>
> --
> Cecil Westerhof
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this so much more efficient?

2017-02-16 Thread Cecil Westerhof
2017-02-16 21:10 GMT+01:00 Dominique Pellé :

> Cecil Westerhof  wrote:
>
> > I have a table vmstat that I use to store vmstat info. ;-)
> > At the moment it has more as 661 thousand records.
> >
> > In principle the values of usertime, systemtime, idletime, waittime and
> > stolentime should add up to 100. I just wanted to check it. Of-course
> there
> > could be a rounding error, so I wrote the following query:
> > SELECT date
> > ,  time
> > ,  usertime
> > ,  systemtime
> > ,  idletime
> > ,  waittime
> > ,  stolentime
> > ,  (usertime + systemtime + idletime + waittime + stolentime) AS
> > totaltime
> > FROM   vmstat
> > WHERE  totaltime  < 99 OR totaltime > 101
> >
> > I did not like that, so I rewrote the WHERE to:
> > WHERE  ABS(100 - totaltime) > 1
> >
> > The funny thing the second WHERE is more efficient as the first, where I
> > would have expected it to be the other way around.
> > The first takes around 1.050 milliseconds.
> > The second takes around  950 milliseconds.
> > So the second is around 10% more efficient. Why is this?
> >
> > In case it is important: I did this in sqlitebrowser 3.7.0, which uses
> > SQLite 3.8.10.2.
>
>
> I suspect  that when you use "WHERE ABS(100 - totaltime) > 1"
> SQLite cannot use an index since it's an expression, so it does a
> full table scan, whereas the other solution which does
> "WHERE  totaltime  < 99 OR totaltime > 101"
> may use an index on totaltime (assuming that there is an index).
>
> In general using an index is good.  But if most of the records
> satisfy the condition "ABS(100 - totaltime) > 1" then an index
> can be more harmful than useful.   And that could explain
> why using "WHERE ABS(100 - totaltime) > 1" is a bit faster.
> You could try "EXPLAIN QUERY PLAN" on your queries to
> see if they use an index or if they do a full table scan.
>

​Totaltime is calculated, so it cannot have an index. ;-)
Besides from the almost 700.000 records only two satisfy the condition.

I should look into EXPLAIN QUERY PLAN.

-- 
Cecil Westerhof
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this so much more efficient?

2017-02-16 Thread Dominique Pellé
Cecil Westerhof  wrote:

> I have a table vmstat that I use to store vmstat info. ;-)
> At the moment it has more as 661 thousand records.
>
> In principle the values of usertime, systemtime, idletime, waittime and
> stolentime should add up to 100. I just wanted to check it. Of-course there
> could be a rounding error, so I wrote the following query:
> SELECT date
> ,  time
> ,  usertime
> ,  systemtime
> ,  idletime
> ,  waittime
> ,  stolentime
> ,  (usertime + systemtime + idletime + waittime + stolentime) AS
> totaltime
> FROM   vmstat
> WHERE  totaltime  < 99 OR totaltime > 101
>
> I did not like that, so I rewrote the WHERE to:
> WHERE  ABS(100 - totaltime) > 1
>
> The funny thing the second WHERE is more efficient as the first, where I
> would have expected it to be the other way around.
> The first takes around 1.050 milliseconds.
> The second takes around  950 milliseconds.
> So the second is around 10% more efficient. Why is this?
>
> In case it is important: I did this in sqlitebrowser 3.7.0, which uses
> SQLite 3.8.10.2.


I suspect  that when you use "WHERE ABS(100 - totaltime) > 1"
SQLite cannot use an index since it's an expression, so it does a
full table scan, whereas the other solution which does
"WHERE  totaltime  < 99 OR totaltime > 101"
may use an index on totaltime (assuming that there is an index).

In general using an index is good.  But if most of the records
satisfy the condition "ABS(100 - totaltime) > 1" then an index
can be more harmful than useful.   And that could explain
why using "WHERE ABS(100 - totaltime) > 1" is a bit faster.
You could try "EXPLAIN QUERY PLAN" on your queries to
see if they use an index or if they do a full table scan.

Dominique
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this so much more efficient?

2017-02-16 Thread R Smith
Basically, the DB size and compactness state influences the speed at 
which values are read. (Every update implies a read).


Add to that the fact that these functions verge on the bottom edge of 
time consumers... It's like testing diffusion speed of a perfume in a 
hurricane.


Glad you found the close competition between most-optimal solutions.


On 2017/02/15 8:43 PM, Cecil Westerhof wrote:

2017-02-15 14:18 GMT+01:00 Cecil Westerhof :


The OR version is the least efficient and it look likes the BETWEEN
version 2 is the most efficient. It looks like it uses less user and more
sys.


​Which is the most efficient is also dependent on the state of the database
itself. I compacted the database and now sometimes NOT BETWEEN is more
efficient and sometimes NOT BETWEEN version 2 is more efficient.
The OR version is always the least efficient.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this so much more efficient?

2017-02-15 Thread Cecil Westerhof
2017-02-15 14:18 GMT+01:00 Cecil Westerhof :

> The OR version is the least efficient and it look likes the BETWEEN
> version 2 is the most efficient. It looks like it uses less user and more
> sys.
>

​Which is the most efficient is also dependent on the state of the database
itself. I compacted the database and now sometimes NOT BETWEEN is more
efficient and sometimes NOT BETWEEN version 2 is more efficient.
The OR version is always the least efficient.

-- 
Cecil Westerhof
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this so much more efficient?

2017-02-15 Thread Cecil Westerhof
2017-02-15 13:40 GMT+01:00 Cecil Westerhof :

> I wrote the following Bash script:
>

​I wrote a better one. See end of post.

Inprinciple you are only interested in the totals. I also changed the runs
from 10 to 25.

The OR version is the least efficient and it look likes the BETWEEN version
2 is the most efficient. It looks like it uses less user and more sys.

The results of two runs:
Timing OR version
real: 7.345
user: 6.688
sys:  0.648


Timing ABS version
real: 6.403
user: 5.56
sys:  0.84


Timing NOT BETWEEN version
real: 6.382
user: 5.62
sys:  0.752


Timing NOT BETWEEN version 2
real: 6.291
user: 5.488
sys:  0.788

and:
Timing OR version
real: 7.228
user: 6.496
sys:  0.728


Timing ABS version
real: 6.647
user: 5.904
sys:  0.74


Timing NOT BETWEEN version
real: 6.431
user: 5.688
sys:  0.736


Timing NOT BETWEEN version 2
real: 6.263
user: 5.492
sys:  0.756


The script:
#!/usr/bin/env bash

set -o errexit
set -o nounset


declare -r COUNT=25
declare -r DISPLAY_ALL=F
declare -r SELECT='
SELECT date
,  time
,  usertime
,  systemtime
,  idletime
,  waittime
,  stolentime
,  (usertime + systemtime + idletime + waittime + stolentime) AS
totaltime
FROM   vmstat
'

declare -r ABS_VERSION="
${SELECT}
WHERE  ABS(100 - totaltime) > 1
;"
declare -r OR_VERSION="
${SELECT}
WHERE  totaltime  < 99 OR totaltime > 101
;"
declare -r NOT_BETWEEN_VERSION="
${SELECT}
WHERE totaltime NOT BETWEEN 99 AND 101
;"
declare -r NOT_BETWEEN_VERSION2="
${SELECT}
WHERE NOT (totaltime BETWEEN 99 AND 101)
;"


function timeQuery {
for i in $(seq ${COUNT}) ; do
sqlite3 ~/Databases/general.sqlite 

Re: [sqlite] Why is this so much more efficient?

2017-02-15 Thread Cecil Westerhof
2017-02-15 12:02 GMT+01:00 R Smith :

> Note however that this may not be entirely true. The Query might read data
> from the disk cache (or several memory caches may be in play) during the
> second run. Run each statement many times, and compare average return times.
>
> Also try this:
> ... WHERE totaltime NOT BETWEEN 99 AND 101;
> or
> ... WHERE NOT (totaltime BETWEEN 99 AND 101);
>

​I wrote the following Bash script:
#!/usr/bin/env bash

set -o errexit
set -o nounset


declare -r COUNT=10
declare -r SELECT='
SELECT date
,  time
,  usertime
,  systemtime
,  idletime
,  waittime
,  stolentime
,  (usertime + systemtime + idletime + waittime + stolentime) AS
totaltime
FROM   vmstat
'

declare -r ABS_VERSION="
${SELECT}
WHERE  ABS(100 - totaltime) > 1
;"
declare -r OR_VERSION="
${SELECT}
WHERE  totaltime  < 99 OR totaltime > 101
;"
declare -r NOT_BETWEEN_VERSION="
${SELECT}
WHERE totaltime NOT BETWEEN 99 AND 101
;"
declare -r NOT_BETWEEN_VERSION2="
${SELECT}
WHERE NOT (totaltime BETWEEN 99 AND 101)
;"


function timeQuery {
for i in $(seq ${COUNT}) ; do
sqlite3 ~/Databases/general.sqlite 

Re: [sqlite] Why is this so much more efficient?

2017-02-15 Thread Cecil Westerhof
2017-02-15 12:02 GMT+01:00 R Smith :

>
> On 2017/02/15 12:33 PM, Cecil Westerhof wrote:
>
>> I have a table vmstat that I use to store vmstat info. ;-)
>> At the moment it has more as 661 thousand records.
>>
>> In principle the values of usertime, systemtime, idletime, waittime and
>> stolentime should add up to 100. I just wanted to check it. Of-course
>> there
>> could be a rounding error, so I wrote the following query:
>> SELECT date
>> ,  time
>> ,  usertime
>> ,  systemtime
>> ,  idletime
>> ,  waittime
>> ,  stolentime
>> ,  (usertime + systemtime + idletime + waittime + stolentime) AS
>> totaltime
>> FROM   vmstat
>> WHERE  totaltime  < 99 OR totaltime > 101
>>
>> I did not like that, so I rewrote the WHERE to:
>> WHERE  ABS(100 - totaltime) > 1
>>
>> The funny thing the second WHERE is more efficient as the first, where I
>> would have expected it to be the other way around.
>> The first takes around 1.050 milliseconds.
>> The second takes around  950 milliseconds.
>> So the second is around 10% more efficient. Why is this?
>>
>
> That's because the first one executes 2 checks mostly - it first checks to
> see if totaltime < 99, if so then it returns true, if not, then a second
> comparison has to be done... so 2 comparison functions on many items.
>
> The second check involves a single calculation and comparison - so what
> you have deduced is that the "minus" function is slightly more efficient
> than occasional extra comparison function.
>

​Minus function and abs function.​




> Note however that this may not be entirely true. The Query might read data
> from the disk cache (or several memory caches may be in play) during the
> second run. Run each statement many times, and compare average return times.
>

​I did. Not very much. But maybe I should try it a bit more often.​



Also try this:
> ... WHERE totaltime NOT BETWEEN 99 AND 101;
> or
> ... WHERE NOT (totaltime BETWEEN 99 AND 101);
>

​I will try those also.

-- 
Cecil Westerhof
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this so much more efficient?

2017-02-15 Thread R Smith


On 2017/02/15 12:33 PM, Cecil Westerhof wrote:

I have a table vmstat that I use to store vmstat info. ;-)
At the moment it has more as 661 thousand records.

In principle the values of usertime, systemtime, idletime, waittime and
stolentime should add up to 100. I just wanted to check it. Of-course there
could be a rounding error, so I wrote the following query:
SELECT date
,  time
,  usertime
,  systemtime
,  idletime
,  waittime
,  stolentime
,  (usertime + systemtime + idletime + waittime + stolentime) AS
totaltime
FROM   vmstat
WHERE  totaltime  < 99 OR totaltime > 101

I did not like that, so I rewrote the WHERE to:
WHERE  ABS(100 - totaltime) > 1

The funny thing the second WHERE is more efficient as the first, where I
would have expected it to be the other way around.
The first takes around 1.050 milliseconds.
The second takes around  950 milliseconds.
So the second is around 10% more efficient. Why is this?


That's because the first one executes 2 checks mostly - it first checks 
to see if totaltime < 99, if so then it returns true, if not, then a 
second comparison has to be done... so 2 comparison functions on many items.


The second check involves a single calculation and comparison - so what 
you have deduced is that the "minus" function is slightly more efficient 
than occasional extra comparison function.


Note however that this may not be entirely true. The Query might read 
data from the disk cache (or several memory caches may be in play) 
during the second run. Run each statement many times, and compare 
average return times.


Also try this:
... WHERE totaltime NOT BETWEEN 99 AND 101;
or
... WHERE NOT (totaltime BETWEEN 99 AND 101);

Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users