Re: [sqlite] Parallel access to read only in memory database

2016-10-13 Thread Howard Chu

Daniel Meyer wrote:

We are interested in using sqlite as a read only, in memory, parallel
access database.  We have database files that are on the order of 100GB
that we are loading into memory.  We have found great performance when
reading from a single thread.  We need to scale up to have many parallel
reader threads.  Once the DB is created it never needs to be modified.  How
can we allow many reader threads on an in memory, write once read many
times database and achieve multi-core performance?  Is this possible with
sqlite?


Thanks for all the helpful responses.  I have moved forward experimenting
with using parallel readers on an in memory sqlite database.  I have found
that I get true concurrency (multi-core speed up) when I create a new
connection to my database file on disk in every thread.  I've verified this
by running a single long query and then running the same query in several
threads and ensuring the net time is the same as the single thread query
time.

In order to get parallel readers on an in memory database I first loaded
the file into memory with:
rc = sqlite3_open_v2("file::memory:?cache=shared", , SQLITE_OPEN_URI |
SQLITE_OPEN_READWRITE, NULL);
I hold onto the db reference in the main thread after loading the data and
don't close that connection until all the worker threads are done
attempting to run my long query.

I then spawn N threads, each creating their own connection like so:
rc = sqlite3_open_v2("file::memory:?cache=shared", , SQLITE_OPEN_URI |
SQLITE_OPEN_READONLY | SQLITE_OPEN_NOMUTEX, NULL);
I have experimented with the various flags; however, everything I do gives
me serial performance.  There is some kind of mutex locking the database so
that running my query say two times takes twice as long as running it once
whereas with the disk based approach using this connection string in each
thread:
rc = sqlite3_open("data.sl3", );
makes the total time to run the query twice the same as running it just
once.

I am hypothesizing that since we are using 'cache=shared' and hence each
thread is sharing the same cache each read requires locking the database.
What I would like is to get the same kind of behavior as we get with "
file::memory:?cache=shared" wherein every time I open a new connection that
connection points to the same memory; however, does not actually involve
sharing the cache so no global mutex locks the database on every read.

I have put my test code in a gist.
My C code is here:
https://gist.github.com/danielrmeyer/fae54d5993f2800626c616e72782b5eb
I generate the 1.5GB test database with this python 3.4 script:
https://gist.github.com/danielrmeyer/bfa415256502471d1512f2155e76adc2

I compiled the C code on my system with the following command:
gcc -std=gnu99 test.c -o test -lsqlite3 -lpthread (I did download the
amalgamation and copied the sqlite.h and sqlite.o files into my cwd after
building)

I apologize if the C code is not as clean as it could be.  I'm primarily a
Python programmer but figured i'd be more likely to get help with a C test
case so I did my best to hack this together.  The Python GIL was confusing
the situation in any case.

A little background on what I am doing:  I have several large datasets that
I wish to serve up to customers to generate custom reports based on unique
slices of the data.  I am using a cluster of machines with .5TB of memory
each so loading all the data into memory is reasonable in my case.  I've
found that against my production work load I get massive speedups in single
threaded tests against the in memory database relative to the disk
version.  In fact I have found that the single threaded sqlite in memory
tests are faster than all the other database solutions i've looked at so I
am very excited about using sqlite, nevertheless I really need to scale to
many cores.  Also, my work load is highly random so cache is not much
help.  I really want the data in memory.  Any help is greatly appreciated.
I have started experimenting with memory mapped io; however, I have not had
much luck so far.


Use LMDB in SQLightning. Since LMDB reads acquire no locks, they scale 
perfectly linearly across arbitrarily many CPUs. No other DB engine will do 
what you're looking for. Everything else based on locking will bottleneck as 
soon as you extend beyond a single CPU socket.



I would not mind creating a fork of sqlite on github and hacking the code
if someone could give me pointers on what needs to be modified to get this
working.  Certainly if there is an extra flag or URI I need to use to get
concurrent in memory read access that would be great, but I'm willing to
try and modify the source code and sharing with the community if I can
figure out how to get this going.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list

Re: [sqlite] Parallel access to read only in memory database

2016-10-13 Thread Daniel Meyer
 reach the person managing the list at
> sqlite-users-ow...@mailinglists.sqlite.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of sqlite-users digest..."
>
>
> Today's Topics:
>
>1. Parallel access to read only in memory database (Daniel Meyer)
>2. Re: Parallel access to read only in memory database
>   (Joe Mistachkin)
>3. Re: Parallel access to read only in memory database (Simon Slavin)
>4. Re: Parallel access to read only in memory database
>   (Stephen Chrzanowski)
>5. Re: Parallel access to read only in memory database
>   (Keith Medcalf)
>6. Protecting databases (Damien Sykes-Lindley)
>7. Re: Protecting databases (Darren Duncan)
>8. Re: Protecting databases (Damien Sykes-Lindley)
>9. Re: Protecting databases (Darren Duncan)
>   10. Re: Protecting databases (Darren Duncan)
>   11. Re: Protecting databases (Jean-Christophe Deschamps)
>   12. Re: Protecting databases (R Smith)
>   13. Re: Protecting databases (Damien Sykes-Lindley)
>   14. Error trying to make sqlite3 documentation
>   (Domingo Alvarez Duarte)
>   15. Re: Error trying to make sqlite3 documentation (Richard Hipp)
>   16. Fwd: Lemon: Simple recursive rule causes assertion failed:
>   stateno <= YY_SHIFT_COUNT (Conor O)
>
>
> --------------
>
> Message: 1
> Date: Fri, 7 Oct 2016 13:37:47 -0700
> From: Daniel Meyer <meye1...@gmail.com>
> To: sqlite-users@mailinglists.sqlite.org
> Subject: [sqlite] Parallel access to read only in memory database
> Message-ID:
> <CAFcY+X65nGF+jw6u+yFA0fChsEBkfNFoDy-=+NmP86cVrUE
> 9...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> We are interested in using sqlite as a read only, in memory, parallel
> access database.  We have database files that are on the order of 100GB
> that we are loading into memory.  We have found great performance when
> reading from a single thread.  We need to scale up to have many parallel
> reader threads.  Once the DB is created it never needs to be modified.  How
> can we allow many reader threads on an in memory, write once read many
> times database and achieve multi-core performance?  Is this possible with
> sqlite?
>
>
> --
>
> Message: 2
> Date: Fri, 7 Oct 2016 13:45:56 -0700
> From: "Joe Mistachkin" <sql...@mistachkin.com>
> To: "'SQLite mailing list'" <sqlite-users@mailinglists.sqlite.org>
> Subject: Re: [sqlite] Parallel access to read only in memory database
> Message-ID: <630C1D80247B4261AEFA40822A7605A0@LACHRYMOSE>
> Content-Type: text/plain;   charset="us-ascii"
>
>
> Daniel Meyer wrote:
> >
> > How can we allow many reader threads on an in memory, write once read
> many
> > times database and achieve multi-core performance?  Is this possible with
> > sqlite?
> >
>
> Have you tried using the URI "file::memory:?cache=shared" with one of the
> sqlite3_open*() C APIs?  Further details on using URI file names may be
> found here:
>
> https://www.sqlite.org/uri.html
>
> --
> Joe Mistachkin @ https://urn.to/r/mistachkin
>
>
>
> --
>
> Message: 3
> Date: Fri, 7 Oct 2016 21:48:51 +0100
> From: Simon Slavin <slav...@bigfraud.org>
> To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> Subject: Re: [sqlite] Parallel access to read only in memory database
> Message-ID: <3a3bf749-3164-4d99-9494-0be186f46...@bigfraud.org>
> Content-Type: text/plain; charset=us-ascii
>
>
> On 7 Oct 2016, at 9:37pm, Daniel Meyer <meye1...@gmail.com> wrote:
>
> > We have database files that are on the order of 100GB [...] in memory
>
> You have 100GB memory ?
>
> Simon.
>
>
> --
>
> Message: 4
> Date: Fri, 7 Oct 2016 17:03:24 -0400
> From: Stephen Chrzanowski <pontia...@gmail.com>
> To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> Subject: Re: [sqlite] Parallel access to read only in memory database
> Message-ID:
> <CAANptrH4CK+17piiaPT+vw0HqFNjr4ULwotuJyLfy8jKFwyOUA
> @mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> (My two cents) I just setup two brand new machines in our Colo for ESX.
> Both machines had 256gig of memory.  Not unheard of in server situations.
> ;)
>
> On Fri, Oct 7, 2016 at 4:48 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>
> >
> > On 7 Oct 2016, at 9:37pm, Daniel Meyer <meye1...@gmail.com> wrote:
> >
> > > We hav

Re: [sqlite] Parallel access to read only in memory database

2016-10-11 Thread Howard Chu

Jens Alfke wrote:



On Oct 9, 2016, at 10:41 AM, Howard Chu  wrote:

As for code freshness, I've seen no compelling new features from 3.8.x onward 
that would improve performance so there's been no reason to update further.


Perhaps, but there’s important new functionality in newer versions, such as 
partial indexes and indexes on expressions.

If it’s a personal project for you, or a tech demo, then it’s understandable 
for you to drop it when it stops being interesting; but if this is something 
intended for other people to use, they’re going to want to see it supported 
going forward (the way SQLCipher is, for instance.) It would be nice to add a 
note to the readme saying something like “FYI, this is based on SQLite 3.7.17 
and there are no plans to sync with newer versions.”


As with any open source project, if users want to see something change, the 
onus is on them to initiate those changes. Nobody associated with an open 
source project is ever obligated to proactively implement anything. 
particularly things that no user has asked for yet. SQLightning has many 
satisfied users already and none of them have requested the features you 
mentioned.


As for myself, it works for me in my personal builds of Mozilla Seamonkey.

Lack of development activity on a project doesn't mean it's dead and disused - 
it just means the project has accomplished its goals and its users are content.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-10 Thread Jens Alfke

> On Oct 9, 2016, at 10:41 AM, Howard Chu  wrote:
> 
> As for code freshness, I've seen no compelling new features from 3.8.x onward 
> that would improve performance so there's been no reason to update further.

Perhaps, but there’s important new functionality in newer versions, such as 
partial indexes and indexes on expressions.

If it’s a personal project for you, or a tech demo, then it’s understandable 
for you to drop it when it stops being interesting; but if this is something 
intended for other people to use, they’re going to want to see it supported 
going forward (the way SQLCipher is, for instance.) It would be nice to add a 
note to the readme saying something like “FYI, this is based on SQLite 3.7.17 
and there are no plans to sync with newer versions.”

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-09 Thread Howard Chu

Jens Alfke wrote:



On Oct 9, 2016, at 8:15 AM, Howard Chu  wrote:

Use SQLightning, it's designed specifically for write once read many workloads.


"SQLite3 ported to use LMDB instead of its original Btree code” — sounds great, 
and the performance figures quoted in the readme are impressive. But the source code 
appears to be a modified version of SQLite’s source, which would make it very 
difficult to keep in sync with SQLite, and (as Domingo pointed out) the last commit 
is three years old and seems to be based on SQLite 3.7.17. So far this looks like an 
exciting proof-of-concept, but not something I’d use in a real project.

(By comparison, SQLCipher is also released as a modified copy of SQLite, but 
they sync with SQLite regularly; the latest version from this April is based on 
3.11.0.)

It would be best if this were implemented as a separate plugin, but as I’m not 
familiar with the innards of SQLite, I’ll assume that simply wasn't feasible. 
(I know SQLite supports VFS plugins, but I think those just operate at the 
paging layer, below the b-tree.)


SQLite3's code structure is monolithic, not modular, so there is no clean way 
to replace its underlying Btree layer. It's necessarily a hack-and-slash 
proposition.


As for code freshness, I've seen no compelling new features from 3.8.x onward 
that would improve performance so there's been no reason to update further. 
Many of the SQLite performance enhancements from 3.8 are in its own Btree 
code, which is entirely ripped out in SQLightning and thus irrelevant. On the 
other hand, there have been multiple regressions (performance and otherwise) 
in subsequent releases, most often in the query planner, which I've chosen to 
avoid.


E.g.
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg85558.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg86191.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg86901.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg89666.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg91201.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg92189.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg99646.html

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-09 Thread Jens Alfke

> On Oct 9, 2016, at 8:15 AM, Howard Chu  wrote:
> 
> Use SQLightning, it's designed specifically for write once read many 
> workloads.

"SQLite3 ported to use LMDB instead of its original Btree code” — sounds great, 
and the performance figures quoted in the readme are impressive. But the source 
code appears to be a modified version of SQLite’s source, which would make it 
very difficult to keep in sync with SQLite, and (as Domingo pointed out) the 
last commit is three years old and seems to be based on SQLite 3.7.17. So far 
this looks like an exciting proof-of-concept, but not something I’d use in a 
real project.

(By comparison, SQLCipher is also released as a modified copy of SQLite, but 
they sync with SQLite regularly; the latest version from this April is based on 
3.11.0.)

It would be best if this were implemented as a separate plugin, but as I’m not 
familiar with the innards of SQLite, I’ll assume that simply wasn't feasible. 
(I know SQLite supports VFS plugins, but I think those just operate at the 
paging layer, below the b-tree.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-09 Thread Domingo Alvarez Duarte

Hello !

But SQLightning is a dead project, isn't it ?

version 3.7.17 
 



hyc  
committed on 12 Sep 2013


Cheers !

On 09/10/16 12:15, Howard Chu wrote:

Daniel Meyer wrote:

We are interested in using sqlite as a read only, in memory, parallel
access database.  We have database files that are on the order of 100GB
that we are loading into memory.  We have found great performance when
reading from a single thread.  We need to scale up to have many parallel
reader threads.  Once the DB is created it never needs to be 
modified.  How

can we allow many reader threads on an in memory, write once read many
times database and achieve multi-core performance?  Is this possible 
with

sqlite?


Use SQLightning, it's designed specifically for write once read many 
workloads.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-09 Thread Howard Chu

Daniel Meyer wrote:

We are interested in using sqlite as a read only, in memory, parallel
access database.  We have database files that are on the order of 100GB
that we are loading into memory.  We have found great performance when
reading from a single thread.  We need to scale up to have many parallel
reader threads.  Once the DB is created it never needs to be modified.  How
can we allow many reader threads on an in memory, write once read many
times database and achieve multi-core performance?  Is this possible with
sqlite?


Use SQLightning, it's designed specifically for write once read many workloads.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-08 Thread Jens Alfke

> On Oct 7, 2016, at 1:45 PM, Joe Mistachkin  wrote:
> 
> Have you tried using the URI "file::memory:?cache=shared”

Shared cache will definitely help when using multiple read-only connections, 
but I have a feeling memory-mapping would help even more since it would 
hopefully eliminate redundant memcpy calls between the database and the “file” 
cache.

Daniel, try using “pragma mmap_size=N” where N is basically the minimum of the 
size of the database file and the amount of free RAM.

> How can we allow many reader threads on an in memory, write once read many
> times database and achieve multi-core performance?

This part’s easy: just open a database connection per thread. Using the 
shared-cache mode as above means the threads won’t each allocate their own 
cache, which will save memory and improve locality of reference.

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-07 Thread Keith Medcalf

Machines with >100GB of RAM have been commonplace for a several years.  These 
days, 384 GB is quite common.  

Even 1 TB is not a "special build" anymore -- you can buy them "off the shelf" 
from Dell ... (Dell no longer makes custom machines but only sells fixed 
configurations off the boat from china)

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Simon Slavin
> Sent: Friday, 7 October, 2016 14:49
> To: SQLite mailing list
> Subject: Re: [sqlite] Parallel access to read only in memory database
> 
> 
> On 7 Oct 2016, at 9:37pm, Daniel Meyer <meye1...@gmail.com> wrote:
> 
> > We have database files that are on the order of 100GB [...] in memory
> 
> You have 100GB memory ?
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-07 Thread Stephen Chrzanowski
(My two cents) I just setup two brand new machines in our Colo for ESX.
Both machines had 256gig of memory.  Not unheard of in server situations. ;)

On Fri, Oct 7, 2016 at 4:48 PM, Simon Slavin  wrote:

>
> On 7 Oct 2016, at 9:37pm, Daniel Meyer  wrote:
>
> > We have database files that are on the order of 100GB [...] in memory
>
> You have 100GB memory ?
>
> Simon.
> ___
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-07 Thread Simon Slavin

On 7 Oct 2016, at 9:37pm, Daniel Meyer  wrote:

> We have database files that are on the order of 100GB [...] in memory

You have 100GB memory ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-07 Thread Joe Mistachkin

Daniel Meyer wrote:
>
> How can we allow many reader threads on an in memory, write once read many
> times database and achieve multi-core performance?  Is this possible with
> sqlite? 
> 

Have you tried using the URI "file::memory:?cache=shared" with one of the
sqlite3_open*() C APIs?  Further details on using URI file names may be
found here:

https://www.sqlite.org/uri.html

--
Joe Mistachkin @ https://urn.to/r/mistachkin

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Parallel access to read only in memory database

2016-10-07 Thread Daniel Meyer
We are interested in using sqlite as a read only, in memory, parallel
access database.  We have database files that are on the order of 100GB
that we are loading into memory.  We have found great performance when
reading from a single thread.  We need to scale up to have many parallel
reader threads.  Once the DB is created it never needs to be modified.  How
can we allow many reader threads on an in memory, write once read many
times database and achieve multi-core performance?  Is this possible with
sqlite?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users