Re: [jira] Commented: (COUCHDB-61) Separate storage of attachments from the main database file

Damien Katz Sun, 28 Dec 2008 03:25:28 -0800

What you are proposing is grouping the document data together forbetter OS caching performance, but CouchDB already does that.Documents bodies are written to contiguous regions, one after another,the attachments are stored in separate locations in the file.


-Damien



On Dec 28, 2008, at 5:21 AM, Maximillian Dornseif (JIRA) wrote:

[ https://issues.apache.org/jira/browse/COUCHDB-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659425#action_12659425 ]
Maximillian Dornseif commented on COUCHDB-61:
---------------------------------------------
To my understanding of OS access pattern they actually do interferewith document access and view building.
Lets say you have 100 MB of documents and 100 GB of attatchments. Tomy (very limited) understanding in the current database file"normal" b+-tree pages and attatchments would be interleaved. So ondisk (assuming best-case continues allocation) it would looksomething like
DAAADADDAAAAAAAADADDAAAAAAADAAAADAAAAADAAAAADAAAAADADAAAAAADA

(D=doc, A=attatchment)
So if I want to access the whole B+-Tree for an operation I have toskip around in the file by using seeks or some other techniques.Seeks harm the caching ability of the OS and are generally slow. Andthe OS obviously is not able to read the whole 100.1 GB file in asingle chunk into it's cache.
Compare that to having two files:

DDDDDDDDDDDDDD = document file, 100 MB
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA = attachment file,100 GB
Now the OS caching subsystem can happily read the whole documentfile into RAM. Even if CouchDB uses seeks they can be served fromthe cache. The OS doesn't have to wrong around the huge binarychunks of seldom accessed attachment data.
Generally to get most out of the IO optimizations of an OperatingSystem don't save data with different access patterns in the samefile. The exact impact differs very much from OS to OS but this isone of the main reasons why databases (even MySQL) use differentfiles for different parts of the database - unless they managediskspace independently of the OS with raw partitions.
All this assumes that you access and change attachments less oftenthan your documents and documents are considerable smaller thanattachments. I would call this a save bet for most scenarios.
Separate storage of attachments from the main database file
-----------------------------------------------------------

               Key: COUCHDB-61
               URL: https://issues.apache.org/jira/browse/COUCHDB-61
           Project: CouchDB
        Issue Type: New Feature
        Components: Database Core
       Environment: All
          Reporter: Jan Lehnardt
          Priority: Minor
At the moment all document- and attachment-data go into the samedatabase file. It would be nice if the attachments could be savedin a different file. This would enable the use of slower andcheaper hardware for attachment storage and faster hardware for thedocument and index data storage.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (COUCHDB-61) Separate storage of attachments from the main database file

Reply via email to