What I'd do: 1) Look at the indices, and make sure the input is sorted to insert in index order. Also drop any unnecessary indices and add them back at the end. [Read the code for vacuum to see what kinds of things make sense to defer.]
2) Bump up the cache a lot. Inserting in sorted order makes this less essential, but for a one-off like this it can't hurt. 3) Turn synchronous off while loading, and commit to starting over if you get an OS-level crash. Even with synchronous off, things should be correctly ordered for app-level crashes. 4) Maybe use exclusive locking? That wouldn't have a huge impact if you're batching inserts, I expect. --- WRT #3, you could also consider journal_mode off (or memory, if your code requires transactions to work right). In that case, the database state is indeterminate if you have an app-level crash, but you should be fine if you make it to the end. WRT #1, I would consider fts as an index for these purposes, but it may require schema changes to make is possible to selectively disable/enable the indexing. See https://sqlite.org/fts5.html#external_content_tables . I'm not sure there would be much gain from disabling fts when loading, though, as long as your bulk transactions are large. -scott On Mon, Apr 24, 2017 at 11:00 AM, Jens Alfke <j...@mooseyard.com> wrote: > I’m importing a large data set with a lot of rows — an entire Wikipedia > dump, about 60GB, one article per row — into a brand new SQLite database in > WAL mode. What’s the fastest way to import it? > > I started with one big transaction, but noted that (of course) the WAL > file was growing rapidly while the main database file stayed tiny. I > figured this would become inefficient, so I stopped the run and adjusted my > code to commit and re-open a transaction every 10,000 rows. > > With that, the import started quickly, but as time went on the commits > were taking longer and longer, so the process was spending most of its time > committing. (I wasn’t able to finish the job, as it ran into an unrelated > fatal error in my code about ⅔ of the way through.) > > Would it have been faster to use a single transaction? Even if the commit > at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over > and over again. If so, would periodic WAL checkpoints help? > > —Jens > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users