Re: [sqlite] SQLite scalability

2013-11-22 Thread Howard Chu

Valentin Davydov wrote:

On Thu, Nov 21, 2013 at 11:43:32AM +0200, Baruch Burstein wrote:

Hi all,

I know SQLite is supposed to support DB sizes in the TB (I think the
default configuration can reach 1TB). I am curious if anyone actually uses
SQlite at anywhere near this.


Yes.


Does anyone use it regularly for DBs 500GB+
in size, or with tables containing 10 billion rows+?


I've got an installation with SQLite DB of several terabytes in size.
It contains about 20 billions thoroughly indexed records, and grows every
day (more precisely, every night) by a few tens of millions of new records.


How much concurrency does your use require?


I've spent some efforts to eliminate concurrency in application. That is,
updates and selects occur at very different times of the day.


How long do selects take (assuming indexes are set correctly?)


It depends of the size of the select. Single row is selected instantaneously.
Check of the uniqueness takes about 1-2 minutes per 1 million of records, most
of time being spent parsing SQL commands. Whereas aggregate functions over
substantional fraction of the entire database, of course, take too long to
be executed in real time.


Use SQLightning instead - concurrency issues are irrelevant then, since 
writers don't block readers. And it will search multiple gigabytes per second, 
as opposed to your millions-per-minute figure above.



Are there problems of locking ("normal" SQLite doesn't usually
suffer from locking since transactions are very quick, but if transactions
can be in the order of 100's of ms, I think locking can easily happen if
the DB is accessed concurrently a few times a second, though I am not sure
if this may only apply to writes).


Yes. Single transaction (insertion of that tens of millions of new recors)
takes hours in the worst case.


There's no good reason for insertion of 10 million records to take hours.


I understand that the answer to most of these questions can be very
hardware (and software) dependent,


Indeed not so. The only hardware capable of storing such amount of data
is an array of magnetic disks, and their latency time (about 10-20 ms for
random access) is much more than any reasonable software overhead. Even
cache (internal SQLite page cache and/or operation system file cache)
occupies the same memory and therefore has almost the same effect. The
only software which determines the performance is SQLite itself, in my
case, perhaps, trees rebalancing algorithm.


1TB SSDs are only ~$500; there's no reason to limit yourself to the slowness 
of magnetic disks these days.


http://www.amazon.com/Samsung-Electronics-EVO-Series-2-5-Inch-MZ-7TE1T0BW/dp/B00E3W16OU

SQLightning uses only the OS filesystem cache, so you get maximal use of the 
available system RAM instead of wasting half of it with redundant copies in 
application-level caches.



but I am just trying to get a feel for
SQLite's applicability for a project I am working on that may reach limits
like these.


The only definive SQLite limits are documentet in the relevant manual page.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite scalability

2013-11-22 Thread Valentin Davydov
On Thu, Nov 21, 2013 at 11:43:32AM +0200, Baruch Burstein wrote:
> Hi all,
> 
> I know SQLite is supposed to support DB sizes in the TB (I think the
> default configuration can reach 1TB). I am curious if anyone actually uses
> SQlite at anywhere near this.

Yes. 

> Does anyone use it regularly for DBs 500GB+
> in size, or with tables containing 10 billion rows+? 

I've got an installation with SQLite DB of several terabytes in size.
It contains about 20 billions thoroughly indexed records, and grows every
day (more precisely, every night) by a few tens of millions of new records.

> How much concurrency does your use require?

I've spent some efforts to eliminate concurrency in application. That is,
updates and selects occur at very different times of the day.

> How long do selects take (assuming indexes are set correctly?)

It depends of the size of the select. Single row is selected instantaneously.
Check of the uniqueness takes about 1-2 minutes per 1 million of records, most 
of time being spent parsing SQL commands. Whereas aggregate functions over 
substantional fraction of the entire database, of course, take too long to 
be executed in real time.

> Are there problems of locking ("normal" SQLite doesn't usually
> suffer from locking since transactions are very quick, but if transactions
> can be in the order of 100's of ms, I think locking can easily happen if
> the DB is accessed concurrently a few times a second, though I am not sure
> if this may only apply to writes).

Yes. Single transaction (insertion of that tens of millions of new recors)
takes hours in the worst case.

> I understand that the answer to most of these questions can be very
> hardware (and software) dependent,

Indeed not so. The only hardware capable of storing such amount of data 
is an array of magnetic disks, and their latency time (about 10-20 ms for
random access) is much more than any reasonable software overhead. Even
cache (internal SQLite page cache and/or operation system file cache) 
occupies the same memory and therefore has almost the same effect. The
only software which determines the performance is SQLite itself, in my
case, perhaps, trees rebalancing algorithm.

> but I am just trying to get a feel for
> SQLite's applicability for a project I am working on that may reach limits
> like these.

The only definive SQLite limits are documentet in the relevant manual page.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite scalability

2013-11-22 Thread Eduardo Morras
On Thu, 21 Nov 2013 11:43:32 +0200
Baruch Burstein  wrote:

> Hi all,
> 
> I know SQLite is supposed to support DB sizes in the TB (I think the
> default configuration can reach 1TB). I am curious if anyone actually uses
> SQlite at anywhere near this. Does anyone use it regularly for DBs 500GB+
> in size, or with tables containing 10 billion rows+? How much concurrency
> does your use require? How long do selects take (assuming indexes are set
> correctly?) Are there problems of locking ("normal" SQLite doesn't usually
> suffer from locking since transactions are very quick, but if transactions
> can be in the order of 100's of ms, I think locking can easily happen if
> the DB is accessed concurrently a few times a second, though I am not sure
> if this may only apply to writes).
> 
> I understand that the answer to most of these questions can be very
> hardware (and software) dependent, but I am just trying to get a feel for
> SQLite's applicability for a project I am working on that may reach limits
> like these.

Depends on what type/kind of use you need for your data. If you are going to do 
a lot insert, update, delete, perhaps sqlite isn't for you. 

If it's principal use is for select, when populate the tables, do it pre-sorted 
by the data primary key or by the colum which makes a better quality index for 
your use (look for 'low quality indexes' in sqlite docs, and do the opposite).

Normalize your db as much as you can, but not more ;) .

Sqlite allows you to attach up to 30 db files. Split your data between 2-3 db 
and put each one in different disks.

Increase cache size before creating any table, other RDBMS uses a lot of memory 
from several MBs to GBs, give Sqlite cache 500MB-1GB for example.

Write your queries in different ways and test which is better, there are some 
tricks with indexes, joins..., that can help you (study documentation)

Set STATS3 or 4 for analyze your db data before creating any table.

Set autovacuum full before creating any table, even if you don't plan to 
delete/update data. If I Remember Correctly (if not correct me please), 
autovacuum adds metadata to db file that allows Sqlite engine do some internal 
works faster.

> 
> Thanks,
> Baruch
> 
> -- 
> ˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users