Re: [sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread Clemens Ladisch
Scott Hess wrote:
> WRT #3, you could also consider journal_mode off (or memory, if your code
> requires transactions to work right).  In that case, the database state is
> indeterminate if you have an app-level crash, but you should be fine if you
> make it to the end.

It would be a better idea to change to the rollback journal mode (DELETE/
TRUNCATE/PERSIST) - it's guaranteed to work correctly in all situations,
and with a small database to be filled, the journal cannot become large
because it stores only _old_ data.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread Scott Hess
What I'd do:

1) Look at the indices, and make sure the input is sorted to insert in
index order.  Also drop any unnecessary indices and add them back at the
end.  [Read the code for vacuum to see what kinds of things make sense to
defer.]

2) Bump up the cache a lot.  Inserting in sorted order makes this less
essential, but for a one-off like this it can't hurt.

3) Turn synchronous off while loading, and commit to starting over if you
get an OS-level crash.  Even with synchronous off, things should be
correctly ordered for app-level crashes.

4) Maybe use exclusive locking?  That wouldn't have a huge impact if you're
batching inserts, I expect.

---

WRT #3, you could also consider journal_mode off (or memory, if your code
requires transactions to work right).  In that case, the database state is
indeterminate if you have an app-level crash, but you should be fine if you
make it to the end.

WRT #1, I would consider fts as an index for these purposes, but it may
require schema changes to make is possible to selectively disable/enable
the indexing.  See https://sqlite.org/fts5.html#external_content_tables .
I'm not sure there would be much gain from disabling fts when loading,
though, as long as your bulk transactions are large.

-scott


On Mon, Apr 24, 2017 at 11:00 AM, Jens Alfke  wrote:

> I’m importing a large data set with a lot of rows — an entire Wikipedia
> dump, about 60GB, one article per row — into a brand new SQLite database in
> WAL mode. What’s the fastest way to import it?
>
> I started with one big transaction, but noted that (of course) the WAL
> file was growing rapidly while the main database file stayed tiny. I
> figured this would become inefficient, so I stopped the run and adjusted my
> code to commit and re-open a transaction every 10,000 rows.
>
> With that, the import started quickly, but as time went on the commits
> were taking longer and longer, so the process was spending most of its time
> committing. (I wasn’t able to finish the job, as it ran into an unrelated
> fatal error in my code about ⅔ of the way through.)
>
> Would it have been faster to use a single transaction? Even if the commit
> at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over
> and over again. If so, would periodic WAL checkpoints help?
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread John Found
Do you have some indices on the table? Drop them before the huge insert and 
then recreate again. It is faster.

On Mon, 24 Apr 2017 11:00:00 -0700
Jens Alfke  wrote:

> I’m importing a large data set with a lot of rows — an entire Wikipedia dump, 
> about 60GB, one article per row — into a brand new SQLite database in WAL 
> mode. What’s the fastest way to import it?
> 
> I started with one big transaction, but noted that (of course) the WAL file 
> was growing rapidly while the main database file stayed tiny. I figured this 
> would become inefficient, so I stopped the run and adjusted my code to commit 
> and re-open a transaction every 10,000 rows.
> 
> With that, the import started quickly, but as time went on the commits were 
> taking longer and longer, so the process was spending most of its time 
> committing. (I wasn’t able to finish the job, as it ran into an unrelated 
> fatal error in my code about ⅔ of the way through.)
> 
> Would it have been faster to use a single transaction? Even if the commit at 
> the end is epic-length, it wouldn’t be rewriting the b-tree nodes over and 
> over again. If so, would periodic WAL checkpoints help?
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
http://fresh.flatassembler.net
http://asm32.info
John Found 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users