Thanks Michael.  Comments below:

> -----Original Message-----
> From: [email protected] [mailto:sqlite-users-
> [email protected]] On Behalf Of Black, Michael (IS)
> Sent: 09 November 2012 13:49
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Anomalously slow performance on updates to early
> entries in a DB
> 
> Hmmm...is this a disk head seeking problem?
> You've got several TEXT entries which are either NULL or default to '';
> I did a small experiment: on 3.7.13
> CREATE TABLE container_stat (
>                 account TEXT,
>                 container TEXT,
>                 created_at TEXT,
>                 put_timestamp TEXT DEFAULT '0',
>                 delete_timestamp TEXT DEFAULT '0',
>                 object_count INTEGER,
>                 bytes_used INTEGER,
>                 reported_put_timestamp TEXT DEFAULT '0',
>                 reported_delete_timestamp TEXT DEFAULT '0',
>                 reported_object_count INTEGER DEFAULT 0,
>                 reported_bytes_used INTEGER DEFAULT 0,
>                 hash TEXT default '00000000000000000000000000000000',
>                 id TEXT,
>                 status TEXT DEFAULT '',
>                 status_changed_at TEXT DEFAULT '0',
>                 metadata TEXT DEFAULT '',
>                 x_container_sync_point1 INTEGER DEFAULT -1,
>                 x_container_sync_point2 INTEGER DEFAULT -1);
> insert into container_stat(id,status,status_changed_at)
> values('id1','status1','');
> insert into container_stat(id,status,status_changed_at)
> values('id2','status2','');
> insert into container_stat(id,status,status_changed_at)
> values('id3','status3','');
> insert into container_stat(id,status,status_changed_at)
> values('id4','status4','');
> insert into container_stat(id,status,status_changed_at)
> values('id5','status5','');
> delete from container_stat where account='id1';
> insert into container_stat(account,status,status_changed_at)
> values('id1','status1 change1 to something
> else','status_changhed_at_1');
> 
> If you look at the order of data BEFORE the delete/insert occurs you
> get this (using "strings");
> 000000000000000000000000000000000000id5status5
> 000000000000000000000000000000000000id4status4
> 000000000000000000000000000000000000id3status3
> 000000000000000000000000000000000000id2status2
> 000000000000000000000000000000000000id1status1
> 
> After a delete/insert where one of the fields grows in size you get
> this:
> id1000000000000000000000000000000000000status1 change1 to something
> elsestatus_changed_at_1
> 000000000000000000000000000000000000id5status5
> 000000000000000000000000000000000000id4status4
> 000000000000000000000000000000000000id3status3
> 000000000000000000000000000000000000id2status2
> 000000000000000000000000000000000000id1status1
> 
> So...the probability of an update needing to move due to larger data
> increases the closer you are to the beginning of the database.  Each
> update would reduce the likelihood of that record getting relocated
> again as the fields grow in size.
>
[O'Toole, Eamonn] This definitely sounds like it could be an issue.  There is 
just one container_stat entry (confirmed by sqlite3_analyzer output which I'll 
post later).  So you're saying that the single container_stat table entry is 
potentially being relocated very frequently the closer the update is to the 
beginning of the db?


 
> If you use default values that are 2 sigma of the sizes of your strings
> (or maybe just 2X the average length or so) you would reduce the head
> seek time on updating records.
> 
> It basically sounds possibly like your disk head is slewing from the
> beginning of the file to the end on many of your updates.
[O'Toole, Eamonn] What we do see is that the disk utilization is at 100% during 
updates to the beginning of the file, but the actual throughput in terms of 
MB/s is low - which tallies with your hypothesis.

> 
> Also...would "insert or update" help you a bit?
> http://www.sqlite.org/lang_conflict.html
[O'Toole, Eamonn] I'll have a look at this, thanks.
> 
> 
> 
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Advanced GEOINT Solutions Operating Unit
> Northrop Grumman Information Systems
> 
> ________________________________________
> From: [email protected] [[email protected]]
> on behalf of O'Toole, Eamonn [[email protected]]
> Sent: Friday, November 09, 2012 5:07 AM
> To: [email protected]
> Subject: EXT :[sqlite] Anomalously slow performance on updates to early
> entries in a DB
> 
> Hello all,
> 
> First of all, I'm a complete novice with respect to SQLite so apologies
> if there is an obvious answer to my question.  I've also posted this
> question in IRC, and it was suggested that I post the question to this
> mailing list.  We're running a test-bed of an object store (Openstack
> Swift) which uses SQLite to record information on the objects that are
> stored in a container.  The table itself (called "object") is very
> simple, it contains the modification time of the object, the size of
> object, the md5sum of the object, and the content-type of the object.
> We are seeing a performance anomaly on updates to existing object
> records in the SQLite DB.  If the container DB is sufficiently large
> (about 10 million objects, 3.3GB) then the time to update records at
> the beginning of the database by order of entry is anomalously high.
> The time is particularly bad for the first approx. 100K records, is
> somewhat better for the next 900K records, and settles down to a
> consistent average from
>   approx 1 million records on.  If this consistent average time is
> around 7 seconds for 10,000 updates, then we see times of about 170
> seconds for 10,000 updates on the first 100K records.  We don't see
> this anomalously high update time if we start the updates after the
> first 1 million records.
> 
> Note that table updates are performed by first DELETEing the entry and
> then INSERTing the changed entry.
> 
> Does anybody have any idea why we're seeing this behaviour, and what we
> can do to fix it?
> 
> Note that Chunky_Ks (IRC) has suggested that B-tree rebalancing could
> be the fundamental issue.
> 
> Thanks.
> 
> 
> SQL Schema follows:
> 
> 1 database: main
> 
> Tables:
> container_stat
> incoming_sync
> object
> outgoing_sync
> 
> container_stat:
> CREATE TABLE container_stat (
>                 account TEXT,
>                 container TEXT,
>                 created_at TEXT,
>                 put_timestamp TEXT DEFAULT '0',
>                 delete_timestamp TEXT DEFAULT '0',
>                 object_count INTEGER,
>                 bytes_used INTEGER,
>                 reported_put_timestamp TEXT DEFAULT '0',
>                 reported_delete_timestamp TEXT DEFAULT '0',
>                 reported_object_count INTEGER DEFAULT 0,
>                 reported_bytes_used INTEGER DEFAULT 0,
>                 hash TEXT default '00000000000000000000000000000000',
>                 id TEXT,
>                 status TEXT DEFAULT '',
>                 status_changed_at TEXT DEFAULT '0',
>                 metadata TEXT DEFAULT '',
>                 x_container_sync_point1 INTEGER DEFAULT -1,
>                 x_container_sync_point2 INTEGER DEFAULT -1
> 
> incoming_sync:
> CREATE TABLE incoming_sync (
>                 remote_id TEXT UNIQUE,
>                 sync_point INTEGER,
>                 updated_at TEXT DEFAULT 0
>             );
> CREATE TRIGGER incoming_sync_insert AFTER INSERT ON incoming_sync
>             BEGIN
>                 UPDATE incoming_sync
>                 SET updated_at = STRFTIME('%s', 'NOW')
>                 WHERE ROWID = new.ROWID;
>             END;
> CREATE TRIGGER incoming_sync_update AFTER UPDATE ON incoming_sync
>             BEGIN
>                 UPDATE incoming_sync
>                 SET updated_at = STRFTIME('%s', 'NOW')
>                 WHERE ROWID = new.ROWID;
>             END;
> 
> object:
> CREATE TABLE object (
>                 ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
>                 name TEXT,
>                 created_at TEXT,
>                 size INTEGER,
>                 content_type TEXT,
>                 etag TEXT,
>                 deleted INTEGER DEFAULT 0
>             );
> CREATE INDEX ix_object_deleted_name ON object (deleted, name);
> CREATE TRIGGER object_delete AFTER DELETE ON object
>             BEGIN
>                 UPDATE container_stat
>                 SET object_count = object_count - (1 - old.deleted),
>                     bytes_used = bytes_used - old.size,
>                     hash = chexor(hash, old.name, old.created_at);
>             END;
> CREATE TRIGGER object_insert AFTER INSERT ON object
>             BEGIN
>                 UPDATE container_stat
>                 SET object_count = object_count + (1 - new.deleted),
>                     bytes_used = bytes_used + new.size,
>                     hash = chexor(hash, new.name, new.created_at);
>             END;
> CREATE TRIGGER object_update BEFORE UPDATE ON object
>             BEGIN
>                 SELECT RAISE(FAIL, 'UPDATE not allowed; DELETE and
> INSERT');
>             END;
> 
> outgoing_sync:
> CREATE TABLE outgoing_sync (
>                remote_id TEXT UNIQUE,
>                 sync_point INTEGER,
>                 updated_at TEXT DEFAULT 0
>             );
> CREATE TRIGGER outgoing_sync_insert AFTER INSERT ON outgoing_sync
>             BEGIN
>                 UPDATE outgoing_sync
>                 SET updated_at = STRFTIME('%s', 'NOW')
>                 WHERE ROWID = new.ROWID;
>             END;
> CREATE TRIGGER outgoing_sync_update AFTER UPDATE ON outgoing_sync
>             BEGIN
>                 UPDATE outgoing_sync
>                 SET updated_at = STRFTIME('%s', 'NOW')
>                 WHERE ROWID = new.ROWID;
>             END;
> 
> 
> 
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to