[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Simon Slavin

On 13 Jan 2016, at 7:36pm, Warren Young  wrote:

> On Jan 13, 2016, at 5:29 AM, Simon Slavin  wrote:
> 
>> My only concern with what you wrote is that you mention 100 tables.  In 
>> order to find the table you specify SQLite has to go through a list of 
>> (hashed, I think) table names, and going through an average of 50 of them 
>> per command can be slow.
> 
> Wouldn?t that be log2(100) = 6.6 or log(100 = 4.6 maximum node visits?
> 
> Most hash table implementations have logarithmic lookup time, not linear.

You're quite right.  I wrote "50" first, then I remembered that the list was 
hashed.  Thanks for the correction.

Simon.


[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Olivier Mascia

> Le 13 janv. 2016 ? 16:28, Dominique Devienne  a ?crit 
> :
> 
>> "When a read operation begins on a WAL-mode database, it first remembers
>> the location of the last valid commit record in the WAL. Call this point
>> the "end mark". Because the WAL can be growing and adding new commit
>> records while various readers connect to the database, each reader can
>> potentially have its own end mark. But for any particular reader, the end
>> mark is unchanged for the duration of the transaction, thus ensuring that a
>> single read transaction only sees the database content as it existed at a
>> single point in time."
>> 
> 
> That does sound like MVCC indeed. Although I guess a true MVCC DB allows
> concurrent readers and writer(s), i.e. here new readers could grab the
> "last" "end mark" before the write started. Does WAL mode support that? But
> if your writes are as small as you say, perhaps reader can retry later with
> the timeout feature of SQLite. --DD

Kind-of MVCC, but not MVCC, indeed, at least not like Firebird MVCC for 
instance. But well enough for my needs if it really fits the description, which 
I'll write a dedicated test for.

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 



[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Dominique Devienne
On Wed, Jan 13, 2016 at 4:06 PM, Olivier Mascia  wrote:

> Some kind of MVCC is very interesting to us for the purpose of running
> read transactions which would see a stable view of the data, not seeing at
> all the writes committed after the start of the read transactions. If I'm
> not misinterpreting http://www.sqlite.org/wal.html this is what SQLite
> gives us:
>
> "When a read operation begins on a WAL-mode database, it first remembers
> the location of the last valid commit record in the WAL. Call this point
> the "end mark". Because the WAL can be growing and adding new commit
> records while various readers connect to the database, each reader can
> potentially have its own end mark. But for any particular reader, the end
> mark is unchanged for the duration of the transaction, thus ensuring that a
> single read transaction only sees the database content as it existed at a
> single point in time."
>

That does sound like MVCC indeed. Although I guess a true MVCC DB allows
concurrent readers and writer(s), i.e. here new readers could grab the
"last" "end mark" before the write started. Does WAL mode support that? But
if your writes are as small as you say, perhaps reader can retry later with
the timeout feature of SQLite. --DD


[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Paul

If you ever going to use ANALYZE on your database and database is going to be 
open frequently (like once per request) consider dropping sqlite_stat3 and 
sqlite_stat4 tables.

SQLite reads content of those tables on each open. Number of tables greatly 
contributes to amount of data stored in there. For me, having below 30 tables, 
penalty of keeping sqlite_stat3 and sqlite_stat4 after ANALYZE equals to some 5 
milliseconds of overhead on each open.

13 January 2016, 13:43:06, by "Olivier Mascia" :

>   Hello,
> 
> Is there any known structural performance issue working with a schema made of 
> about 100 tables, about 80 foreign keys constraints, and some indexes in 
> addition to those implicit of the primary keys and foreign keys. In my book 
> it does not qualify as a complex schema, some tables would have 30 to 40 
> columns and 4 or 5 tables are candidates for a moderate number of rows 
> (rarely more than 1 million), while one of the tables could receive about 10 
> millions rows after some years of data collection (so again nothing really 
> fancy).
> 
> Does sqlite have to reparse the schema text often to execute the queries? Or 
> is the schema somehow translated internally to a, stored, digested 
> ('compiled') format, to ease its working?
> 
> The application which would use this schema is a server-side application 
> (quite along the lines described in http://sqlite.org/whentouse.html).  We 
> have started experimentation and things look very good, excellent should I 
> say, so the above question is more about which details to supervise, which 
> traps to avoid.  I'm pretty sure there are people here with valuable 
> background with similar datasets.
> 
> Thanks,
> -- 
> Meilleures salutations, Met vriendelijke groeten, Best Regards,
> Olivier Mascia, integral.be/om
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Olivier Mascia
Thanks Simon & Dominique,

> Le 13 janv. 2016 ? 13:29, Simon Slavin  a ?crit :
> 
>> Does sqlite have to reparse the schema text often to execute the queries? Or 
>> is the schema somehow translated internally to a, stored, digested 
>> ('compiled') format, to ease its working?
> 
> The first time SQLite needs the database schema it reads it from the file 
> and, as you suspected, stores it internally (in a more convenient and 
> efficient format).  The internal version is valid only until a transaction 
> ends and the database is unlocked, because after that another process might 
> create/delete a table, or create a UNIQUE index or something.  However, a 
> schema-version counter is stored in the database and if, on the next command, 
> SQLite sees that this hasn't changed it will continue to use its internal 
> copy.

Perfect.

>> The application which would use this schema is a server-side application 
>> (quite along the lines described in http://sqlite.org/whentouse.html).  We 
>> have started experimentation and things look very good, excellent should I 
>> say, so the above question is more about which details to supervise, which 
>> traps to avoid.  I'm pretty sure there are people here with valuable 
>> background with similar datasets.
> 
> I wrote server-side front ends to SQLite databases without problems.  Any 
> problems come from network communications, not SQLite.

Network not an issue for us. :) All the processing is inside the server, 
including the GUI for users which is remotely served through a specific 
protocol (it more or less is the same thing as a server serving HTML and a 
browser displaying it, but what's served is not HTML and the browser is a 
dedicated client app to that task). One hour of typical data-entry tasks by one 
single person using a platform native GUI, which makes you think the whole 
application is local when it isn't, can weight 1 or 2 hundreds KB of network 
bytes in total. Thin. Very thin. Not comparable to remote desktop solutions : 
we don't transfer pixels but short instructions to draw a window, place a 
control, put this string inside, and on the reverse channel, get events 
(including updated data) for only the GUI actions which are significant to the 
application logic on the server-side.

> My only concern with what you wrote is that you mention 100 tables.  In order 
> to find the table you specify SQLite has to go through a list of (hashed, I 
> think) table names, and going through an average of 50 of them per command 
> can be slow.  If you have many tables with the same columns it's more 
> efficient to merge them into one table with an extra column for the original 
> table-name.  Not only can this speed things up but it makes the database file 
> smaller since each table and index in the database has its own list of pages 
> of data.

Hmm. Indeed this needs further analysis. I'll dig in the code, now that I know 
what to look for.

Thanks a lot Simon.


> Le 13 janv. 2016 ? 13:36, Dominique Devienne  a ?crit 
> :
> 
>> The application which would use this schema is a server-side application
>> (quite along the lines described in http://sqlite.org/whentouse.html).
>> We have started experimentation and things look very good, excellent should
>> I say, so the above question is more about which details to supervise,
>> which traps to avoid.  I'm pretty sure there are people here with valuable
>> background with similar datasets.
>> 
> 
> In a server app, it depends what kind of concurrency you want to support.
> Write locks are exclusive, and database wide, and SQLite doesn't implement
> MVCC, so concurrency could be an issue. WAL mode is supposed to be better
> for concurrency too.

It was well understood from the evaluation we did over the last 6 weeks. The 
outcome of the tests we did are:

- each processing task use its own connection to the database
- a private cache
- SQLite is setup for SQLITE_THREADSAFE=2 ('MULTITHREAD', not 'SERIALIZED'), 
because we never share a connection (and its dependencies like statements) 
between worker threads processing tasks
- every write transaction duration it out of the hands of the users and are all 
designed to be as short-lived as possible (we talk msec here for the most of 
them), and our server serializes those write transactions
- we for now have opted for the WAL mode

Some kind of MVCC is very interesting to us for the purpose of running read 
transactions which would see a stable view of the data, not seeing at all the 
writes committed after the start of the read transactions. If I'm not 
misinterpreting http://www.sqlite.org/wal.html this is what SQLite gives us:

"When a read operation begins on a WAL-mode database, it first remembers the 
location of the last valid commit record in the WAL. Call this point the "end 
mark". Because the WAL can be growing and adding new commit records while 
various readers connect to the database, each reader can potentially have its 
own end mark. But for any 

[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Warren Young
On Jan 13, 2016, at 1:45 PM, Simon Slavin  wrote:
> 
> On 13 Jan 2016, at 7:36pm, Warren Young  wrote:
> 
>> Wouldn?t that be log2(100) = 6.6 or log(100 = 4.6 maximum node visits?
>> 
>> Most hash table implementations have logarithmic lookup time, not linear.
> 
> You're quite right.

No, not entirely. :)

Hash tables are not inherently balanced, so depending on the input data and the 
hashing scheme used, it could be as bad as 100 visits.  A good hashing 
algorithm reduces that worst case chance to near zero, but some amount of 
imbalance is to be expected.

The logarithmic values are best typical case, not worst case.

Contrast a red-black or AVL tree, which is *always* balanced.  That?s why C++ 
STL implementations use one of those two for std::map and such.  The efficiency 
guarantees in the C++ standard disallow implementing them in terms of hash 
tables.  Later (C++11) they added std::unordered_map and friends with different 
restrictions tat allow hash table implementations.


[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Dominique Devienne
On Wed, Jan 13, 2016 at 12:43 PM, Olivier Mascia  wrote:

> Is there any known structural performance issue working with a schema made
> of [many] tables, [...] foreign keys constraints, [...] indexes [...]
> primary keys and foreign keys. [...] tables would have [many] columns and
> [...] rows
>

SQLite scales well in general.


> Does sqlite have to reparse the schema text often to execute the queries?
> Or is the schema somehow translated internally to a, stored, digested
> ('compiled') format, to ease its working?
>

Don't know. But I never heard about a 'compiled' form for the schema.
Statements, yes, that's the VDBE code, but I believe the schema is reparsed
from the text in sqlite_master when necessary. The SQL parser for SQLite is
quite fast BTW, and unless you don't cache your prepared statement and
recreate (and thus reparse them) all the time, it's unlikely to be an issue
IMHO.


> The application which would use this schema is a server-side application
> (quite along the lines described in http://sqlite.org/whentouse.html).
> We have started experimentation and things look very good, excellent should
> I say, so the above question is more about which details to supervise,
> which traps to avoid.  I'm pretty sure there are people here with valuable
> background with similar datasets.
>

In a server app, it depends what kind of concurrency you want to support.
Write locks are exclusive, and database wide, and SQLite doesn't implement
MVCC, so concurrency could be an issue. WAL mode is supposed to be better
for concurrency too.
if your data can be naturally partitioned into several database files (all
using the same schema), you can achieve better concurrency if you can open
more than one DB file simultaneously. This is an approach we're using
internally. Assumes your FKs don't cross these Db file boundaries.
Another gotcha is related to large blobs, if you need partial update of
them. SQLite must rewrite the whole blob. May not apply to you though. Same
issue with alter table add column if you have large blobs, SQLite must
rewrite the rows (and the blobs), which can be very expensive. Typical
work-around (hack IMHO) is to put your blobs in separate aux tables.

There are other small gotchas (untyped / dynamically typed columns, legacy
non-standard behaviors, etc...) but the above are the two things that come
to mind given what you describe. --DD


[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Olivier Mascia
Hello,

Is there any known structural performance issue working with a schema made of 
about 100 tables, about 80 foreign keys constraints, and some indexes in 
addition to those implicit of the primary keys and foreign keys. In my book it 
does not qualify as a complex schema, some tables would have 30 to 40 columns 
and 4 or 5 tables are candidates for a moderate number of rows (rarely more 
than 1 million), while one of the tables could receive about 10 millions rows 
after some years of data collection (so again nothing really fancy).

Does sqlite have to reparse the schema text often to execute the queries? Or is 
the schema somehow translated internally to a, stored, digested ('compiled') 
format, to ease its working?

The application which would use this schema is a server-side application (quite 
along the lines described in http://sqlite.org/whentouse.html).  We have 
started experimentation and things look very good, excellent should I say, so 
the above question is more about which details to supervise, which traps to 
avoid.  I'm pretty sure there are people here with valuable background with 
similar datasets.

Thanks,
-- 
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om




[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Warren Young
On Jan 13, 2016, at 5:29 AM, Simon Slavin  wrote:
> 
> My only concern with what you wrote is that you mention 100 tables.  In order 
> to find the table you specify SQLite has to go through a list of (hashed, I 
> think) table names, and going through an average of 50 of them per command 
> can be slow.

Wouldn?t that be log2(100) = 6.6 or log(100 = 4.6 maximum node visits?

Most hash table implementations have logarithmic lookup time, not linear.


[sqlite] Are there practical limits to a not-so simple schema?

2016-01-13 Thread Simon Slavin

On 13 Jan 2016, at 11:43am, Olivier Mascia  wrote:

> Is there any known structural performance issue working with a schema made of 
> about 100 tables, about 80 foreign keys constraints, and some indexes in 
> addition to those implicit of the primary keys and foreign keys. In my book 
> it does not qualify as a complex schema, some tables would have 30 to 40 
> columns and 4 or 5 tables are candidates for a moderate number of rows 
> (rarely more than 1 million), while one of the tables could receive about 10 
> millions rows after some years of data collection (so again nothing really 
> fancy).

None of these numbers are anywhere near the limits of SQLite.  If you want 
specific numbers (which you don't need to worry about with this database) see 
this:



> Does sqlite have to reparse the schema text often to execute the queries? Or 
> is the schema somehow translated internally to a, stored, digested 
> ('compiled') format, to ease its working?

The first time SQLite needs the database schema it reads it from the file and, 
as you suspected, stores it internally (in a more convenient and efficient 
format).  The internal version is valid only until a transaction ends and the 
database is unlocked, because after that another process might create/delete a 
table, or create a UNIQUE index or something.  However, a schema-version 
counter is stored in the database and if, on the next command, SQLite sees that 
this hasn't changed it will continue to use its internal copy.

> The application which would use this schema is a server-side application 
> (quite along the lines described in http://sqlite.org/whentouse.html).  We 
> have started experimentation and things look very good, excellent should I 
> say, so the above question is more about which details to supervise, which 
> traps to avoid.  I'm pretty sure there are people here with valuable 
> background with similar datasets.

I wrote server-side front ends to SQLite databases without problems.  Any 
problems come from network communications, not SQLite.

My only concern with what you wrote is that you mention 100 tables.  In order 
to find the table you specify SQLite has to go through a list of (hashed, I 
think) table names, and going through an average of 50 of them per command can 
be slow.  If you have many tables with the same columns it's more efficient to 
merge them into one table with an extra column for the original table-name.  
Not only can this speed things up but it makes the database file smaller since 
each table and index in the database has its own list of pages of data.

Simon.