Interesting. If you can guarantee that you will only have a single thread accessing a single database only from one single thread, give it a try with SQLITE_OPEN_NOMUTEX | SQLITE_OPEN_READONLY in the flags parameter of sqlite3_open_v2 ...
Don't know if it will make a difference, but it might. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-----Original Message----- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Lee, Jason >Sent: Monday, 22 April, 2019 17:33 >To: sqlite-users@mailinglists.sqlite.org >Subject: Re: [sqlite] Multiple Independent Database Instances > >> How does each thread know whether the file has been "previously >processed" or not? > > >The paths are pushed onto a queue and each thread pops the top off. I >am also looking into the queuing code to see if there are issues > > >> In other words, if you "get rid of" all the sqlite3 processing and >replace it with a 5 ms sleep, does increasing the number of threads >exhibit the same symptom? > > >The timings were for sqlite3_open_v2, not for the whole process. The >current code is effectively just an sqlite3_open_v2 followed by an >sqlite3_close, and yet the time it takes to complete sqlite3_open_v2 >still increases with the number of threads. > > >> Even with gobs of RAM and solid-state storage, I/O will quickly >> bottleneck because the processor is 3 orders of magnitude faster >than >> RAM and 6 orders faster than the disk. Once you exhaust the I/O >bus, >> it's exhausted. > > >I/O is not the bottleneck. I have 8 NVMe drives in RAID0. I have not >been able to drive the disks in the slightest because the threads >spend the majority of their time in sqlite3_open_v2, sqlite3_close, >and sqlite3_prepare_v2. > > >Jason Lee > >________________________________ >From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> on >behalf of James K. Lowden <jklow...@schemamania.org> >Sent: Monday, April 22, 2019 4:53:42 PM >To: sqlite-users@mailinglists.sqlite.org >Subject: Re: [sqlite] Multiple Independent Database Instances > >On Mon, 22 Apr 2019 21:25:31 +0000 >"Lee, Jason" <jason...@lanl.gov> wrote: > >> I have a set of several million database files sitting on my >> filesystem. Each thread will open a previously unprocessed database >> file, do some queries, close the database, and move on to the next >> unprocessed database file. > >Fascinating. One wonders what Feynman would have said. > >Even with gobs of RAM and solid-state storage, I/O will quickly >bottleneck because the processor is 3 orders of magnitude faster than >RAM and 6 orders faster than the disk. Once you exhaust the I/O bus, >it's exhausted. > >I would build a pipeline, and let processes do the work. Write a >program >to process a single database: open, query, output, close. Then >define >a make(1) rule to convert one database into one output file. Then >run >"make -j dd" where "dd" is the number of simultaneous processes (or >"jobs"). I think you'll find ~10 processes is all you can sustain. > >You could use the sqlite3 utility as your "program", but it's not >very >good at detecting errors and returning a nonzero return status to the >OS. Hence a bespoke program. Also, you can get the data into binary >form, suitable for concatenation into one big file for input into >your >numerical process. That will go a lot faster. > >Although there's some overhead to invoking a million processes, it's >dwarfed by the I/O time. > >The advantage of doing the work under make is that it's reusable and >restartable. if you bury the machine, you can kill make and restart >it >with a lower number of jobs. If you find some databases are corrupt >or >incomplete, you can replace them, and make will reprocess only the >new >ones. If you add other databases at a later time, make will process >only those. You can add subsequent steps, too; make won't start from >square 1 unless it has to. > >With millions of inputs, the odds are you will find problems. >Perfectly good input over a dataset that size probably occured before >in recorded history, but not frequently. > >I assume your millions of databases are not in a single directory; >I'd >guess you have 1000s of directories. They offer convenient work >partitions, which you might need; I have no idea how make will >respond >to a dependency tree with millions of nodes. > >--jkl > >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users