[sqlite] order by not working in combination with random()

Scott Robison Tue, 1 Sep 2015 22:23:44 -0600

On Sat, Aug 29, 2015 at 4:16 AM, Yahoo! Mail <stefanossofroniou542 at yahoo.com
> wrote:

> On 08/28/2015 09:36 PM, Scott Robison wrote:
>
>> On Fri, Aug 28, 2015 at 3:47 AM, Yahoo! Mail <
>> stefanossofroniou542 at yahoo.com
>>
>>> wrote:
>>>
>>> On 08/26/2015 09:03 PM, Richard Hipp wrote:
>>>
>>> Time stands still for multiple rows, as long as they are within the
>>>
>>>> same sqlite3_step() call.  For example, if you run:
>>>>
>>>>       CREATE TABLE t1(a DATETIME);
>>>>       WITH RECURSIVE
>>>>          c(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM c WHERE
>>>> x<1000000000)
>>>>       INSERT INTO t1(a) SELECT datetime('now') FROM c;
>>>>
>>>> The entire INSERT statement will be a single sqlite3_step() call, and
>>>> so all billion rows of t1 will get set to the same time.
>>>>
>>>> I have tested this code with version 3.8.10.2 using "DB Browser for
>>> SQLite" and it would crash; the same with SQLite Manager that uses the
>>> same
>>> version. I have decided to use the latest SQLite version available from
>>> GNU
>>> / Linux Debian testing 64-bit; it's 3.8.11.1 and something strange is
>>> happening. I hope I'm the reason behind its rather peculiar behavior.
>>>
>>> I tested the following code
>>>
>>> *sqlite> drop table if exists t1; create table t1(a datetime); begin;
>>> with
>>> recursive c(x) as (values(1) union all select x + 1 from c where x <
>>> 1000000000) insert into t1(a) select datetime('now') from c; commit;*
>>>
>>> It would take ages to finish and that is logical; it's a billion rows it
>>> has to insert, even though I have used begin - commit.
>>>
>>> Now the rather strange behavior would be when I use the up arrow key to
>>> reuse the previous code but replacing *1000000000* with *10*. It takes
>>> the
>>> same time to finish as before and that is not right; I'm dropping the
>>> table
>>> and it should take nearly 0 milliseconds for 10 rows.
>>>
>>> Can someone confirm this?
>>>
>>> I have been using 3.8.11, so I just downloaded 3.8.11.1 to test your
>> exact
>> query, cut & pasted from above. Here is the output from my session:
>>
>> X:\>sqlite3
>> SQLite version 3.8.11.1 2015-07-29 20:00:57
>> Enter ".help" for usage hints.
>> Connected to a transient in-memory database.
>> Use ".open FILENAME" to reopen on a persistent database.
>> sqlite> .timer on
>> sqlite>  drop table if exists t1; create table t1(a datetime); begin; with
>> recursive c(x) as (values(1) union all select x + 1 from c where x <
>> 10000000) insert into t1(a) select datetime('now') from c; commit;
>> Run Time: real 16.751 user 16.520506 sys 0.062400
>> sqlite>  drop table if exists t1; create table t1(a datetime); begin; with
>> recursive c(x) as (values(1) union all select x + 1 from c where x < 10)
>> insert into t1(a) select datetime('now') from c; commit;
>> Run Time: real 0.387 user 0.374402 sys 0.000000
>> sqlite>
>>
>> Note: I pasted the first query that begins drop table, then used up arrow
>> and edited the 10000000 to 10 for the second query. Further, I didn't have
>> time to try a full billion, so I settled for 10 million. Also used a
>> transient in-memory database. Finally, I am using Windows 7 Professional
>> on
>> a box with 16 GiB of RAM.
>>
> I tested it with a local database and not with an in-memory one. I wanted
> someone to test it so we could see if it's something that has to do with a
> reasonable disk access restriction after a certain number of multiple
> attempts.

I misunderstood. I inferred from your original message that you thought
scrolling back through the history and editing the query resulted in the
original query still being executed.

SQLite version 3.8.11.1 2015-07-29 20:00:57
Enter ".help" for usage hints.
sqlite> .timer on
sqlite> drop table if exists t1; create table t1(a datetime); begin; with
   ...> recursive c(x) as (values(1) union all select x + 1 from c where x <
   ...> 1000000000) insert into t1(a) select datetime('now') from c; commit;
Run Time: real 3087.125 user 2534.593750 sys 329.078125
sqlite> drop table if exists t1; create table t1(a datetime); begin; with
   ...> recursive c(x) as (values(1) union all select x + 1 from c where x <
   ...> 10) insert into t1(a) select datetime('now') from c; commit;
Run Time: real 764.309 user 84.031250 sys 132.812500

In watching performance, most of the time spent on the second query seemed
to be the drop table command, and it was primarily disk IO. Note that user
+ sys is only about 216 seconds out of 764. That leaves a lot of time for
disk IO. So I repeated the first query and only did the drop table part of
the second:

sqlite> drop table if exists t1; create table t1(a datetime); begin; with
   ...> recursive c(x) as (values(1) union all select x + 1 from c where x <
   ...> 1000000000) insert into t1(a) select datetime('now') from c; commit;
Run Time: real 4203.249 user 2536.750000 sys 320.281250
sqlite> drop table if exists t1;
Run Time: real 4322.951 user 146.187500 sys 698.984375

Note inserting one billion records took 1200 seconds longer the second time
(where I was reusing the same database file). I would imagine that this is
due to the fact that it is faster to extend the file by a page (one seek
and one write) than it is to update the freelist to make a page available
(seek, read, seek write). 1200 seconds over the course of 28,000,000 pages
isn't much on a per page basis, but it adds up. More interesting, the user
and sys times are statistically identical between the first run of one
billion and the second. That tells us the difference was IO driven (or that
utilization of my computer went way up, or a combination of the two). The
real time is *way* off what I expected, but the usage profile of the
computer was very different by then. The user and sys times were somewhat
higher, but not horribly (depending on your definition of horrible).

Finally, I did the rest of the second set of queries for 10 rows:

sqlite> create table t1(a datetime); begin; with
   ...> recursive c(x) as (values(1) union all select x + 1 from c where x <
   ...> 10) insert into t1(a) select datetime('now') from c; commit;
Run Time: real 0.181 user 0.000000 sys 0.000000

In any case, one billion records involves (on my machine) over 28 million
database pages. It takes a certain amount of time to add all those pages to
the freelist (on my machine it was able to update the freelist at the rate
of about 37,500 pages per second). Dropping a table that large means a lot
of bookkeeping has to be done.

>
>
> Do you see the same behavior for a test of 10M rows (that the second line
>> takes the same amount of time as the first)?
>>
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

-- 
Scott Robison

[sqlite] order by not working in combination with random()

Reply via email to