Dominique Devienne wrote: 

> > Looking at the sqlite web site and mailing lists shows that the SQLite 
> team 
> > has taken a stab at answering the question, "is it faster to read a blob 
> > out of sqlite or out of a file?".  See the links below.  
> > 
> > Does the team have analogous guidance regarding write speeds?  
> > 
> 
> Depends if you care only about the initial write, or also about 
> incremental writes.  

Thanks to those of you who responded already!  I'll give a little 
more flavor for those who are interested.  

My particular application is a cache of large, immutable files.  The 
UPDATE case doesn't apply to me for this particular application.  

I already have a system on top of SQLite's blobs in which I store the 
file data in chunks, one chunk per row.  (I'm downloading the files from
the network, and at the moment am using a version of the Python SQLite 
API that does not have hooks to the incremental blob I/O.)  So I 
download a few megabytes, start the next hunk download in the 
background (using the "requests-futures" Python library, which is 
really sexy BTW), write out the current hunk, commit, continue.  The 
row also has the byte range that it is storing: 

  CREATE TABLE IF NOT EXISTS file( 
      fileID TEXT NOT NULL, 
      start  INTEGER NOT NULL,  -- offset of first byte of chunk 
      end    INTEGER NOT NULL,  -- offset of last byte of chunk 
      bytes  BLOB NOT NULL, 

      CONSTRAINT BlobSize CHECK(length(bytes) == (end-start+1))
  ); 

All requests are for retrieval of a particular range of bytes for a 
particular file.  So I have an index: 

  CREATE INDEX IF NOT EXISTS FileIdxNameStart ON file(
      fileID, 
      start
  ); 

and I create my own little "query plan" in application logic for 
servicing a particular request on the read side.  The read side is 
fast enough for my purposes.  

Inserts are slow and deletes are slow.  The disk I am using is capable 
of writing about 6.5MB/sec, but my app + SQLite only hit around 
600KB-1MB/sec under this scheme, depending on some variables.  This is 
approximately half of the overall speed that I expect (because if I use 
'wget' to just write out a downloaded file to disk, I get about 
2MB/sec sustained over hundreds of MB).  I've played with the chunk 
size, the synchronous pragma, the commit interval, the page size pragma 
and the cache size pragma.  

I have not cleared my application of all blame yet, since I have not
taken enough measurements.  I'm considering prototyping a C
implementation that uses incremental blob I/O to write out the file
and is cleverer about the download state machine, but was hoping to 
know from this group if that is a fool's errand (e.g. if you all
have some particular reason for knowing that huge blobs in SQLite
are a priori a bad idea for performance reasons).

Deletes by fileID are really awful compared to a simple filesystem 
delete -- like 2 orders of magnitude worse by my eyeballing, at least
with the default 1024-byte page size.  Again, I think you'd expect it 
to be -- an unlink(2) only has to touch a few pages of disk, whereas 
an SQLite delete of 1GB worth of blob rows I guess goes and touches 
something every page_size bytes (right?).

So I'm thinking of just storing the data out of the database, perhaps
retaining an SQLite index.  But I'm worried about the safety issues that
arise from that.  E.g. I finish writing my file, I write an index 
entry and commit it, and immediately afterward I get a power failure.
Am I sure that the file bytes are correct?  And so on.  All of those
considerations are right in SQLite's wheelhouse -- I'd hate not to be
able to take advantage of it.


-- 
Eric A. Rubin-Smith

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to