Hmmm...is this a disk head seeking problem?
You've got several TEXT entries which are either NULL or default to '';
I did a small experiment: on 3.7.13
CREATE TABLE container_stat (
account TEXT,
container TEXT,
created_at TEXT,
put_timestamp TEXT DEFAULT '0',
delete_timestamp TEXT DEFAULT '0',
object_count INTEGER,
bytes_used INTEGER,
reported_put_timestamp TEXT DEFAULT '0',
reported_delete_timestamp TEXT DEFAULT '0',
reported_object_count INTEGER DEFAULT 0,
reported_bytes_used INTEGER DEFAULT 0,
hash TEXT default '00000000000000000000000000000000',
id TEXT,
status TEXT DEFAULT '',
status_changed_at TEXT DEFAULT '0',
metadata TEXT DEFAULT '',
x_container_sync_point1 INTEGER DEFAULT -1,
x_container_sync_point2 INTEGER DEFAULT -1);
insert into container_stat(id,status,status_changed_at)
values('id1','status1','');
insert into container_stat(id,status,status_changed_at)
values('id2','status2','');
insert into container_stat(id,status,status_changed_at)
values('id3','status3','');
insert into container_stat(id,status,status_changed_at)
values('id4','status4','');
insert into container_stat(id,status,status_changed_at)
values('id5','status5','');
delete from container_stat where account='id1';
insert into container_stat(account,status,status_changed_at)
values('id1','status1 change1 to something else','status_changhed_at_1');
If you look at the order of data BEFORE the delete/insert occurs you get this
(using "strings");
000000000000000000000000000000000000id5status5
000000000000000000000000000000000000id4status4
000000000000000000000000000000000000id3status3
000000000000000000000000000000000000id2status2
000000000000000000000000000000000000id1status1
After a delete/insert where one of the fields grows in size you get this:
id1000000000000000000000000000000000000status1 change1 to something
elsestatus_changed_at_1
000000000000000000000000000000000000id5status5
000000000000000000000000000000000000id4status4
000000000000000000000000000000000000id3status3
000000000000000000000000000000000000id2status2
000000000000000000000000000000000000id1status1
So...the probability of an update needing to move due to larger data increases
the closer you are to the beginning of the database. Each update would reduce
the likelihood of that record getting relocated again as the fields grow in
size.
If you use default values that are 2 sigma of the sizes of your strings (or
maybe just 2X the average length or so) you would reduce the head seek time on
updating records.
It basically sounds possibly like your disk head is slewing from the beginning
of the file to the end on many of your updates.
Also...would "insert or update" help you a bit?
http://www.sqlite.org/lang_conflict.html
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems
________________________________________
From: [email protected] [[email protected]] on
behalf of O'Toole, Eamonn [[email protected]]
Sent: Friday, November 09, 2012 5:07 AM
To: [email protected]
Subject: EXT :[sqlite] Anomalously slow performance on updates to early entries
in a DB
Hello all,
First of all, I'm a complete novice with respect to SQLite so apologies if
there is an obvious answer to my question. I've also posted this question in
IRC, and it was suggested that I post the question to this mailing list. We're
running a test-bed of an object store (Openstack Swift) which uses SQLite to
record information on the objects that are stored in a container. The table
itself (called "object") is very simple, it contains the modification time of
the object, the size of object, the md5sum of the object, and the content-type
of the object. We are seeing a performance anomaly on updates to existing
object records in the SQLite DB. If the container DB is sufficiently large
(about 10 million objects, 3.3GB) then the time to update records at the
beginning of the database by order of entry is anomalously high. The time is
particularly bad for the first approx. 100K records, is somewhat better for the
next 900K records, and settles down to a consistent average from
approx 1 million records on. If this consistent average time is around 7
seconds for 10,000 updates, then we see times of about 170 seconds for 10,000
updates on the first 100K records. We don't see this anomalously high update
time if we start the updates after the first 1 million records.
Note that table updates are performed by first DELETEing the entry and then
INSERTing the changed entry.
Does anybody have any idea why we're seeing this behaviour, and what we can do
to fix it?
Note that Chunky_Ks (IRC) has suggested that B-tree rebalancing could be the
fundamental issue.
Thanks.
SQL Schema follows:
1 database: main
Tables:
container_stat
incoming_sync
object
outgoing_sync
container_stat:
CREATE TABLE container_stat (
account TEXT,
container TEXT,
created_at TEXT,
put_timestamp TEXT DEFAULT '0',
delete_timestamp TEXT DEFAULT '0',
object_count INTEGER,
bytes_used INTEGER,
reported_put_timestamp TEXT DEFAULT '0',
reported_delete_timestamp TEXT DEFAULT '0',
reported_object_count INTEGER DEFAULT 0,
reported_bytes_used INTEGER DEFAULT 0,
hash TEXT default '00000000000000000000000000000000',
id TEXT,
status TEXT DEFAULT '',
status_changed_at TEXT DEFAULT '0',
metadata TEXT DEFAULT '',
x_container_sync_point1 INTEGER DEFAULT -1,
x_container_sync_point2 INTEGER DEFAULT -1
incoming_sync:
CREATE TABLE incoming_sync (
remote_id TEXT UNIQUE,
sync_point INTEGER,
updated_at TEXT DEFAULT 0
);
CREATE TRIGGER incoming_sync_insert AFTER INSERT ON incoming_sync
BEGIN
UPDATE incoming_sync
SET updated_at = STRFTIME('%s', 'NOW')
WHERE ROWID = new.ROWID;
END;
CREATE TRIGGER incoming_sync_update AFTER UPDATE ON incoming_sync
BEGIN
UPDATE incoming_sync
SET updated_at = STRFTIME('%s', 'NOW')
WHERE ROWID = new.ROWID;
END;
object:
CREATE TABLE object (
ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
created_at TEXT,
size INTEGER,
content_type TEXT,
etag TEXT,
deleted INTEGER DEFAULT 0
);
CREATE INDEX ix_object_deleted_name ON object (deleted, name);
CREATE TRIGGER object_delete AFTER DELETE ON object
BEGIN
UPDATE container_stat
SET object_count = object_count - (1 - old.deleted),
bytes_used = bytes_used - old.size,
hash = chexor(hash, old.name, old.created_at);
END;
CREATE TRIGGER object_insert AFTER INSERT ON object
BEGIN
UPDATE container_stat
SET object_count = object_count + (1 - new.deleted),
bytes_used = bytes_used + new.size,
hash = chexor(hash, new.name, new.created_at);
END;
CREATE TRIGGER object_update BEFORE UPDATE ON object
BEGIN
SELECT RAISE(FAIL, 'UPDATE not allowed; DELETE and INSERT');
END;
outgoing_sync:
CREATE TABLE outgoing_sync (
remote_id TEXT UNIQUE,
sync_point INTEGER,
updated_at TEXT DEFAULT 0
);
CREATE TRIGGER outgoing_sync_insert AFTER INSERT ON outgoing_sync
BEGIN
UPDATE outgoing_sync
SET updated_at = STRFTIME('%s', 'NOW')
WHERE ROWID = new.ROWID;
END;
CREATE TRIGGER outgoing_sync_update AFTER UPDATE ON outgoing_sync
BEGIN
UPDATE outgoing_sync
SET updated_at = STRFTIME('%s', 'NOW')
WHERE ROWID = new.ROWID;
END;
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users