Re: [sqlite] Multiple Independent Database Instances

Keith Medcalf Mon, 22 Apr 2019 16:45:56 -0700

Interesting.  If you can guarantee that you will only have a single thread 
accessing a single database only from one single thread, give it a try with 
SQLITE_OPEN_NOMUTEX | SQLITE_OPEN_READONLY in the flags parameter of 
sqlite3_open_v2 ...


Don't know if it will make a difference, but it might.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Lee, Jason
>Sent: Monday, 22 April, 2019 17:33
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] Multiple Independent Database Instances
>
>> How does each thread know whether the file has been "previously
>processed" or not?
>
>
>The paths are pushed onto a queue and each thread pops the top off. I
>am also looking into the queuing code to see if there are issues
>
>
>> In other words, if you "get rid of" all the sqlite3 processing and
>replace it with a 5 ms sleep, does increasing the number of threads
>exhibit the same symptom?
>
>
>The timings were for sqlite3_open_v2, not for the whole process. The
>current code is effectively just an sqlite3_open_v2 followed by an
>sqlite3_close, and yet the time it takes to complete sqlite3_open_v2
>still increases with the number of threads.
>
>
>> Even with gobs of RAM and solid-state storage, I/O will quickly
>> bottleneck because the processor is 3 orders of magnitude faster
>than
>> RAM and 6 orders faster than the disk.  Once you exhaust the I/O
>bus,
>> it's exhausted.
>
>
>I/O is not the bottleneck. I have 8 NVMe drives in RAID0. I have not
>been able to drive the disks in the slightest because the threads
>spend the majority of their time in sqlite3_open_v2, sqlite3_close,
>and sqlite3_prepare_v2.
>
>
>Jason Lee
>
>________________________________
>From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> on
>behalf of James K. Lowden <jklow...@schemamania.org>
>Sent: Monday, April 22, 2019 4:53:42 PM
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] Multiple Independent Database Instances
>
>On Mon, 22 Apr 2019 21:25:31 +0000
>"Lee, Jason" <jason...@lanl.gov> wrote:
>
>> I have a set of several million database files sitting on my
>> filesystem. Each thread will open a previously unprocessed database
>> file, do some queries, close the database, and move on to the next
>> unprocessed database file.
>
>Fascinating.  One wonders what Feynman would have said.
>
>Even with gobs of RAM and solid-state storage, I/O will quickly
>bottleneck because the processor is 3 orders of magnitude faster than
>RAM and 6 orders faster than the disk.  Once you exhaust the I/O bus,
>it's exhausted.
>
>I would build a pipeline, and let processes do the work. Write a
>program
>to process a single database: open, query, output, close.  Then
>define
>a make(1) rule to convert one database into one output file.  Then
>run
>"make -j dd" where "dd" is the number of simultaneous processes (or
>"jobs").  I think you'll find ~10 processes is all you can sustain.
>
>You could use the sqlite3 utility as your "program", but it's not
>very
>good at detecting errors and returning a nonzero return status to the
>OS. Hence a bespoke program.  Also, you can get the data into binary
>form, suitable for concatenation into one big file for input into
>your
>numerical process.  That will go a lot faster.
>
>Although there's some overhead to invoking a million processes, it's
>dwarfed by the I/O time.
>
>The advantage of doing the work under make is that it's reusable and
>restartable.  if you bury the machine, you can kill make and restart
>it
>with a lower number of jobs.  If you find some databases are corrupt
>or
>incomplete, you can replace them, and make will reprocess only the
>new
>ones.  If you add other databases at a later time, make will process
>only those.  You can add subsequent steps, too; make won't start from
>square 1 unless it has to.
>
>With millions of inputs, the odds are you will find problems.
>Perfectly good input over a dataset that size probably occured before
>in recorded history, but not frequently.
>
>I assume your millions of databases are not in a single directory;
>I'd
>guess you have 1000s of directories.  They offer convenient work
>partitions, which you might need; I have no idea how make will
>respond
>to a dependency tree with millions of nodes.
>
>--jkl
>
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Multiple Independent Database Instances

Reply via email to