What I'd do:
1) Look at the indices, and make sure the input is sorted to insert in
index order. Also drop any unnecessary indices and add them back at the
end. [Read the code for vacuum to see what kinds of things make sense to
defer.]
2) Bump up the cache a lot. Inserting in sorted order makes this less
essential, but for a one-off like this it can't hurt.
3) Turn synchronous off while loading, and commit to starting over if you
get an OS-level crash. Even with synchronous off, things should be
correctly ordered for app-level crashes.
4) Maybe use exclusive locking? That wouldn't have a huge impact if you're
batching inserts, I expect.
---
WRT #3, you could also consider journal_mode off (or memory, if your code
requires transactions to work right). In that case, the database state is
indeterminate if you have an app-level crash, but you should be fine if you
make it to the end.
WRT #1, I would consider fts as an index for these purposes, but it may
require schema changes to make is possible to selectively disable/enable
the indexing. See https://sqlite.org/fts5.html#external_content_tables .
I'm not sure there would be much gain from disabling fts when loading,
though, as long as your bulk transactions are large.
-scott
On Mon, Apr 24, 2017 at 11:00 AM, Jens Alfke wrote:
> I’m importing a large data set with a lot of rows — an entire Wikipedia
> dump, about 60GB, one article per row — into a brand new SQLite database in
> WAL mode. What’s the fastest way to import it?
>
> I started with one big transaction, but noted that (of course) the WAL
> file was growing rapidly while the main database file stayed tiny. I
> figured this would become inefficient, so I stopped the run and adjusted my
> code to commit and re-open a transaction every 10,000 rows.
>
> With that, the import started quickly, but as time went on the commits
> were taking longer and longer, so the process was spending most of its time
> committing. (I wasn’t able to finish the job, as it ran into an unrelated
> fatal error in my code about ⅔ of the way through.)
>
> Would it have been faster to use a single transaction? Even if the commit
> at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over
> and over again. If so, would periodic WAL checkpoints help?
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users