Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-02 Thread Florian Weimer
* Domingo Alvarez Duarte:

> After 12 hours inserting of:
>
> 934,135,285 records on bolsas_familia
>
> 22,711,259 records in favorecidos
>
> 5,570 records in municipios
>
> ...

Insertion will be faster if you create the index after populating the
tables.

> time sqlite3 bolsa_familia3.db "vacuum;"
>
> real147m6.252s
> user10m53.790s
> sys3m43.663s

You really need to increase the cache size if at all possible.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Stephen Chrzanowski
The vacuum removes empty pages by rewriting the database ground up.

https://www.sqlite.org/lang_vacuum.html

"The VACUUM command works by copying the contents of the database into a
temporary database file and then overwriting the original with the contents
of the temporary file. When overwriting the original, a rollback journal or
write-ahead log WAL file is used just as it would be for any other database
transaction. This means that when VACUUMing a database, as much as twice
the size of the original database file is required in free disk space."

I'm not aware whether or not the engine actually checks to see if there are
free pages.  I don't see anything in the documentation on that page that
says that it does a pre-check on empty pages before making the decision to
do the work of a vacuum.  But the vacuum also defrags the database as well
(Internally) and I doubt that there'd be a check for fragmentation of
sorts, so I'd lean towards this is a command that will do its deed
regardless.

Try writing out a large database (Something that will take time to read and
write.  Maybe a few hundred meg) and run two vacuums one right after
another, no other transactions done.


On Sat, Oct 1, 2016 at 10:24 PM, Howard Chu  wrote:

> Domingo Alvarez Duarte wrote:
>
>> Hello Simon !
>>
>> I already did it without using "wal" and the result was the same.
>>
>> And even for my surprise in one try I stopped at the middle performed an
>> "analyze" and the performance deteriorated a lot to a point that I needed
>> to
>> delete the stats tables to get the better performance without "analyze".
>>
>> I also tried with the lsm module and got a bit better performance but
>> with an
>> irregular timing and a bigger disk usage (20%).
>>
>> Also tested with lmdb with an astonishing insertion rate but with a lot
>> more
>> disk usage and irregular timing.
>>
>
> Using LMDB the VACUUM command is supposed to be a no-op; at least that's
> how I intended it. Since LMDB deletes records immediately instead of
> leaving tombstones, there is nothing to vacuum.
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Howard Chu

Domingo Alvarez Duarte wrote:

Hello Simon !

I already did it without using "wal" and the result was the same.

And even for my surprise in one try I stopped at the middle performed an
"analyze" and the performance deteriorated a lot to a point that I needed to
delete the stats tables to get the better performance without "analyze".

I also tried with the lsm module and got a bit better performance but with an
irregular timing and a bigger disk usage (20%).

Also tested with lmdb with an astonishing insertion rate but with a lot more
disk usage and irregular timing.


Using LMDB the VACUUM command is supposed to be a no-op; at least that's how I 
intended it. Since LMDB deletes records immediately instead of leaving 
tombstones, there is nothing to vacuum.


Also tested with leveldb with a worse performance and almost twice disk space
usage.

The data distribution on some tables seem to fall into the worst corner cases
for btrees.

Cheers !


On 01/10/16 18:26, Simon Slavin wrote:

On 1 Oct 2016, at 10:18pm, Domingo Alvarez Duarte  wrote:


About the vacuum I also understand the need to rewrite the whole database
but I'm not sure if it's really necessary to do almost 5 times the database
size in both reads and writes (also an equivalent amount of I/O happened
during insertions).

Can you try it without

db.exec_dml("PRAGMA wal_checkpoint(FULL);");

and see if that improves time ?

That's the only thing I can see.  You're using a nested INSERT OR IGNORE
command I'm not familiar with.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Jay Kreibich


What was the size of the original database?

To VACUUM a database, the process is:

1) Read the logical components of the database, write them to a new file.  This 
will generate at least 1x reads (original size), and 1x writes (final size).  
In most cases the reads will be larger and the writes will be larger, because 
some blocks are touched more than once.  This is especially true if there are 
large indexes.

2) The new file is then copied back to the original file, one block at a time.  
This requires 1x (final) read and 1x write (final).

3) Except, to make the write-back ACID safe, each block of the original 
database needs to be copied during the write-back process, which requires 
another 1x read (final) and 1x write (final) in journal mode.  WAL numbers are 
similar, if not higher.

In other words, in the ideal case you’re going to have a bare minimum of 3x 
final writes and 2x final + 1x original reads, but even on a freshly VACUUMed 
database, you’ll never see ideal numbers— especially if there are indexes… 
indexes are rebuilt by insertion, so if the source table data is not really in 
order, that can require a lot of data shuffling (i.e. extra read/writes).


In a similar note, SQLite typically requires about ~2x final size of free 
storage space to complete a VACUUM.


There are a number of ways to improve this.  Most of the I/O is in the 
write-back process, which is required for ACID proof VACUUM transactions.  In 
2010 I proposed a “VACUUM TO” command that would VACUUM one database file to a 
new database file, essentially making a copy.  This would only require 1x 
original reads, and ~1x+ final writes, and only 1x final free space.  The 
disadvantage is that you end up with a new file that would require closing all 
connections (including those in other applications) and re-opening them.  
SQLite also does not trust OS filesystem commands (such as renaming a new file 
over and old one) to operate in any type of transaction/rollback safe way, so 
it avoids them.  There seems to be a number of times when that’s an acceptable 
alternative, however.

See:

http://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg87972.html
http://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg50941.html


  -j, author of “Using SQLite”, O’Reilly Media



On Oct 1, 2016, at 3:27 PM, Domingo Alvarez Duarte  wrote:

> Hello !
> 
> I'm using sqlite (trunk) for a database (see bellow) and for a final database 
> file of 22GB a "vacuum" was executed and doing so it  made a lot of I/O ( 
> 134GB reads and 117GB writes in 2h:30min).
> 
> Can something be improved on sqlite to achieve a better performance ?
> 
> The data is public available just in case it can be useful to perform tests.
> 
> Cheers !
> 
> 

--  
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it, but showing 
it to the wrong people has the tendency to make them feel uncomfortable." -- 
Angela Johnson



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Domingo Alvarez Duarte

Hello Bob !

I'm using the default sqlite page size, but I also did a try with 32KB 
page size and I've got a bi smaller overall database size but no visible 
perfomance gain in terms of time and I/O.


Also the memory usage skyrocked, also forcing memory swap.

The OS was OS X yosemite, I also posted before a small program with a 
sample of the problematic data only which end with a database of around 
340MB and the same poor perfomance.


Cheers !


On 01/10/16 19:34, Bob Friesenhahn wrote:

On Sat, 1 Oct 2016, Domingo Alvarez Duarte wrote:


Hello !

I'm using sqlite (trunk) for a database (see bellow) and for a final 
database file of 22GB a "vacuum" was executed and doing so it  made a 
lot of I/O ( 134GB reads and 117GB writes in 2h:30min).


What means are you using the evaluate the total amount of I/O?

At what level (e.g. OS system call, individual disk I/O) are you 
measuring the I/O?


If the problem is more physical disk I/O than expected then is it 
possible that the underlying filesystem blocksize does not match the 
blocksize that SQLite is using?  You may have an issue with write 
amplification at the filesystem level.


Bob


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Bob Friesenhahn

On Sat, 1 Oct 2016, Domingo Alvarez Duarte wrote:


Hello !

I'm using sqlite (trunk) for a database (see bellow) and for a final database 
file of 22GB a "vacuum" was executed and doing so it  made a lot of I/O ( 
134GB reads and 117GB writes in 2h:30min).


What means are you using the evaluate the total amount of I/O?

At what level (e.g. OS system call, individual disk I/O) are you 
measuring the I/O?


If the problem is more physical disk I/O than expected then is it 
possible that the underlying filesystem blocksize does not match the 
blocksize that SQLite is using?  You may have an issue with write 
amplification at the filesystem level.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Domingo Alvarez Duarte

Hello Keith !

Probably have you seem in the code attached that also tried that once 
and even the mmap but it doesn't had any visible improvement. Also due 
to the data distribution 80% of the data inserts took 20% of the total 
time and the other 20% of data insert took 80% of the total time.


The final database is has a poor overal performance for what I'm used 
when using sqlite for small databases, I also tried to do it with mysql 
and postgresql but the performance for a so simple database is terrible.


Cheers !

db.exec_dml("PRAGMA synchronous = 0;");
db.exec_dml("PRAGMA journal_mode = WAL");
//db.exec_dml("PRAGMA journal_mode = MEMORY;");
//db.exec_dml("PRAGMA journal_mode = OFF;");
//db.exec_dml("PRAGMA locking_mode = EXCLUSIVE;");
db.exec_dml("PRAGMA temp_store = MEMORY;");
//db.exec_dml("PRAGMA threads = 4;");
//db.exec_dml("PRAGMA mmap_size = 6400;");
auto gigabyte = 1024*1024*1024;
db.exec_dml("PRAGMA mmap_size=" + (gigabyte*16));
//print("mmap_size", db.exec_get_one("PRAGMA mmap_size;"));

//db.exec_dml("PRAGMA cache_size = -64000");
//print("cache_size", db.exec_get_one("PRAGMA cache_size;"));


On 01/10/16 19:21, Keith Medcalf wrote:

Did you change the cache size?  The default is rather small for a database of 
22 GB.


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
On Behalf Of Domingo Alvarez Duarte
Sent: Saturday, 1 October, 2016 15:19
To: SQLite mailing list
Subject: Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

Hello Simon !

Thanks for reply !

I already know your suggestions and if you look at the database schema
and the program used to insert data you can see that  there is no
unnecessary indices active and all inside transactions.

About the vacuum I also understand the need to rewrite the whole
database but I'm not sure if it's really necessary to do almost 5 times
the database size in both reads and writes (also an equivalent amount of
I/O happened during insertions).

Cheers !


On 01/10/16 18:12, Simon Slavin wrote:

On 1 Oct 2016, at 9:27pm, Domingo Alvarez Duarte <mingo...@gmail.com>

wrote:

I'm using sqlite (trunk) for a database (see bellow) and for a final

database file of 22GB a "vacuum" was executed and doing so it  made a lot
of I/O ( 134GB reads and 117GB writes in 2h:30min).

Can something be improved on sqlite to achieve a better performance ?

VACUUM rewrites the entire database.  It will always do a lot of IO.

You should never need to use VACUUM in a production setting.  Perhaps in a
once-a-year maintenance utility but not in normal use.

The fastest way to do lots of insertion is

DROP all INDEXes
DELETE FROM all TABLEs
Do your insertions, bundling up each thousand (ten thousand ?
depends on your system) uses of INSERT in a transaction
if you really want to do VACUUM, do it here
reCREATE all your INDEXes
ANALYZE

(the ANALYZE will also do lots of IO, not as much as VACUUM, but it may

speed up all WHERE / ORDER BY clauses).

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Keith Medcalf

Did you change the cache size?  The default is rather small for a database of 
22 GB.

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Domingo Alvarez Duarte
> Sent: Saturday, 1 October, 2016 15:19
> To: SQLite mailing list
> Subject: Re: [sqlite] Why so much I/O ? Can sqlite be improved ?
> 
> Hello Simon !
> 
> Thanks for reply !
> 
> I already know your suggestions and if you look at the database schema
> and the program used to insert data you can see that  there is no
> unnecessary indices active and all inside transactions.
> 
> About the vacuum I also understand the need to rewrite the whole
> database but I'm not sure if it's really necessary to do almost 5 times
> the database size in both reads and writes (also an equivalent amount of
> I/O happened during insertions).
> 
> Cheers !
> 
> 
> On 01/10/16 18:12, Simon Slavin wrote:
> > On 1 Oct 2016, at 9:27pm, Domingo Alvarez Duarte <mingo...@gmail.com>
> wrote:
> >
> >> I'm using sqlite (trunk) for a database (see bellow) and for a final
> database file of 22GB a "vacuum" was executed and doing so it  made a lot
> of I/O ( 134GB reads and 117GB writes in 2h:30min).
> >>
> >> Can something be improved on sqlite to achieve a better performance ?
> > VACUUM rewrites the entire database.  It will always do a lot of IO.
> You should never need to use VACUUM in a production setting.  Perhaps in a
> once-a-year maintenance utility but not in normal use.
> >
> > The fastest way to do lots of insertion is
> >
> > DROP all INDEXes
> > DELETE FROM all TABLEs
> > Do your insertions, bundling up each thousand (ten thousand ?
> > depends on your system) uses of INSERT in a transaction
> > if you really want to do VACUUM, do it here
> > reCREATE all your INDEXes
> > ANALYZE
> >
> > (the ANALYZE will also do lots of IO, not as much as VACUUM, but it may
> speed up all WHERE / ORDER BY clauses).
> >
> > Simon.
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Domingo Alvarez Duarte

Hello Simon !

I already did it without using "wal" and the result was the same.

And even for my surprise in one try I stopped at the middle performed an 
"analyze" and the performance deteriorated a lot to a point that I 
needed to delete the stats tables to get the better performance without 
"analyze".


I also tried with the lsm module and got a bit better performance but 
with an irregular timing and a bigger disk usage (20%).


Also tested with lmdb with an astonishing insertion rate but with a lot 
more disk usage and irregular timing.


Also tested with leveldb with a worse performance and almost twice disk 
space usage.


The data distribution on some tables seem to fall into the worst corner 
cases for btrees.


Cheers !


On 01/10/16 18:26, Simon Slavin wrote:

On 1 Oct 2016, at 10:18pm, Domingo Alvarez Duarte  wrote:


About the vacuum I also understand the need to rewrite the whole database but 
I'm not sure if it's really necessary to do almost 5 times the database size in 
both reads and writes (also an equivalent amount of I/O happened during 
insertions).

Can you try it without

db.exec_dml("PRAGMA wal_checkpoint(FULL);");

and see if that improves time ?

That's the only thing I can see.  You're using a nested INSERT OR IGNORE 
command I'm not familiar with.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Simon Slavin

On 1 Oct 2016, at 10:18pm, Domingo Alvarez Duarte  wrote:

> About the vacuum I also understand the need to rewrite the whole database but 
> I'm not sure if it's really necessary to do almost 5 times the database size 
> in both reads and writes (also an equivalent amount of I/O happened during 
> insertions).

Can you try it without

   db.exec_dml("PRAGMA wal_checkpoint(FULL);");

and see if that improves time ?

That's the only thing I can see.  You're using a nested INSERT OR IGNORE 
command I'm not familiar with.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Domingo Alvarez Duarte

Hello Simon !

Thanks for reply !

I already know your suggestions and if you look at the database schema 
and the program used to insert data you can see that  there is no 
unnecessary indices active and all inside transactions.


About the vacuum I also understand the need to rewrite the whole 
database but I'm not sure if it's really necessary to do almost 5 times 
the database size in both reads and writes (also an equivalent amount of 
I/O happened during insertions).


Cheers !


On 01/10/16 18:12, Simon Slavin wrote:

On 1 Oct 2016, at 9:27pm, Domingo Alvarez Duarte  wrote:


I'm using sqlite (trunk) for a database (see bellow) and for a final database file of 
22GB a "vacuum" was executed and doing so it  made a lot of I/O ( 134GB reads 
and 117GB writes in 2h:30min).

Can something be improved on sqlite to achieve a better performance ?

VACUUM rewrites the entire database.  It will always do a lot of IO.  You 
should never need to use VACUUM in a production setting.  Perhaps in a 
once-a-year maintenance utility but not in normal use.

The fastest way to do lots of insertion is

DROP all INDEXes
DELETE FROM all TABLEs
Do your insertions, bundling up each thousand (ten thousand ?
depends on your system) uses of INSERT in a transaction
if you really want to do VACUUM, do it here
reCREATE all your INDEXes
ANALYZE

(the ANALYZE will also do lots of IO, not as much as VACUUM, but it may speed 
up all WHERE / ORDER BY clauses).

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Simon Slavin

On 1 Oct 2016, at 9:27pm, Domingo Alvarez Duarte  wrote:

> I'm using sqlite (trunk) for a database (see bellow) and for a final database 
> file of 22GB a "vacuum" was executed and doing so it  made a lot of I/O ( 
> 134GB reads and 117GB writes in 2h:30min).
> 
> Can something be improved on sqlite to achieve a better performance ?

VACUUM rewrites the entire database.  It will always do a lot of IO.  You 
should never need to use VACUUM in a production setting.  Perhaps in a 
once-a-year maintenance utility but not in normal use.

The fastest way to do lots of insertion is

DROP all INDEXes
DELETE FROM all TABLEs
Do your insertions, bundling up each thousand (ten thousand ?
depends on your system) uses of INSERT in a transaction
if you really want to do VACUUM, do it here
reCREATE all your INDEXes
ANALYZE

(the ANALYZE will also do lots of IO, not as much as VACUUM, but it may speed 
up all WHERE / ORDER BY clauses).

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users