[ 
https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Joseph Davis updated COUCHDB-754:
--------------------------------------

    Skill Level: Regular Contributors Level (Easy to Medium)

> Improve couch_file write performance
> ------------------------------------
>
>                 Key: COUCHDB-754
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-754
>             Project: CouchDB
>          Issue Type: Improvement
>         Environment: some code might be platform-specific
>            Reporter: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: cheaper-appending-v2.patch, cheaper-appending.patch
>
>
> I've got a number of possible enhancements to couch_file floating around in 
> my head, wanted to write them down.
> * Use fdatasync instead of fsync.  Filipe posted a patch to the OTP file 
> driver [1] that adds a new file:datasync/1 function.  I suspect that we won't 
> see much of a performance gain from this switch because we append to the file 
> and thus need to update the file metedata anyway.  On the other hand, I'm 
> fairly certain fdatasync is always safe for our needs, so if it is ever more 
> efficient we should use it.  Obviously, we'll need to fall back to 
> file:sync/1 on platforms where the datasync function is not available.
> * Use file:pwrite/2 to batch together multiple outstanding write requests.  
> This is essentially Paul's zip_server [2].  In order to take full advantage 
> of it we need to patch couch_btree to update nodes in parallel.  Currently 
> there should only be 1 outstanding write request in a couch_file at a time, 
> so it wouldn't help at all.
> * Open the file in append mode and stop seeking to eof in user space.  We 
> never modify files (aside from truncating, which is rare enough to be handled 
> separately), so perhaps it would help with performance if we let the kernel 
> deal with the seek.  We'd still need a way to get the file size for the 
> make_blocks function.  I'm wondering if file:read_file_info(Fd) is more 
> efficient than file:position(Fd, eof) for this purpose.
> A caveat - I'm not sure if append-only files are compatible with the previous 
> enhancement.  There is no file:write/2, and I have no idea how file:pwrite 
> behaves on a file which is opened append-only.  Is the Pos ignored, or is it 
> an error?  Will have to test.
> * Use O_DSYNC instead of fsync/fdatasync.  This one is inspired by antirez' 
> recent blog post [3] and some historical discussions on pgsql-performance.  
> Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux, 
> which is currently the same thing) and doing all synchronous writes is 
> reasonably fast.  Antirez' tests showed 250 µs delays for (tiny) synchronous 
> writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system.
> At the very least, this looks to be a compelling choice for file access when 
> the server is running with delayed_commits = true.  We'd need to patch the 
> OTP file driver again, and also investigate the cross-platform support.  In 
> particular, I don't think it works on NFS.
> [1]: http://github.com/fdmanana/otp/tree/fdatasync
> [2]: http://github.com/davisp/zip_server
> [3]: http://antirez.com/post/fsync-different-thread-useless.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to