What I'd do:

1) Look at the indices, and make sure the input is sorted to insert in
index order.  Also drop any unnecessary indices and add them back at the
end.  [Read the code for vacuum to see what kinds of things make sense to
defer.]

2) Bump up the cache a lot.  Inserting in sorted order makes this less
essential, but for a one-off like this it can't hurt.

3) Turn synchronous off while loading, and commit to starting over if you
get an OS-level crash.  Even with synchronous off, things should be
correctly ordered for app-level crashes.

4) Maybe use exclusive locking?  That wouldn't have a huge impact if you're
batching inserts, I expect.

---

WRT #3, you could also consider journal_mode off (or memory, if your code
requires transactions to work right).  In that case, the database state is
indeterminate if you have an app-level crash, but you should be fine if you
make it to the end.

WRT #1, I would consider fts as an index for these purposes, but it may
require schema changes to make is possible to selectively disable/enable
the indexing.  See https://sqlite.org/fts5.html#external_content_tables .
I'm not sure there would be much gain from disabling fts when loading,
though, as long as your bulk transactions are large.

-scott


On Mon, Apr 24, 2017 at 11:00 AM, Jens Alfke <j...@mooseyard.com> wrote:

> I’m importing a large data set with a lot of rows — an entire Wikipedia
> dump, about 60GB, one article per row — into a brand new SQLite database in
> WAL mode. What’s the fastest way to import it?
>
> I started with one big transaction, but noted that (of course) the WAL
> file was growing rapidly while the main database file stayed tiny. I
> figured this would become inefficient, so I stopped the run and adjusted my
> code to commit and re-open a transaction every 10,000 rows.
>
> With that, the import started quickly, but as time went on the commits
> were taking longer and longer, so the process was spending most of its time
> committing. (I wasn’t able to finish the job, as it ran into an unrelated
> fatal error in my code about ⅔ of the way through.)
>
> Would it have been faster to use a single transaction? Even if the commit
> at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over
> and over again. If so, would periodic WAL checkpoints help?
>
> —Jens
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to