[
https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Joseph Davis updated COUCHDB-754:
--------------------------------------
Skill Level: Regular Contributors Level (Easy to Medium)
> Improve couch_file write performance
> ------------------------------------
>
> Key: COUCHDB-754
> URL: https://issues.apache.org/jira/browse/COUCHDB-754
> Project: CouchDB
> Issue Type: Improvement
> Environment: some code might be platform-specific
> Reporter: Adam Kocoloski
> Fix For: 1.1
>
> Attachments: cheaper-appending-v2.patch, cheaper-appending.patch
>
>
> I've got a number of possible enhancements to couch_file floating around in
> my head, wanted to write them down.
> * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file
> driver [1] that adds a new file:datasync/1 function. I suspect that we won't
> see much of a performance gain from this switch because we append to the file
> and thus need to update the file metedata anyway. On the other hand, I'm
> fairly certain fdatasync is always safe for our needs, so if it is ever more
> efficient we should use it. Obviously, we'll need to fall back to
> file:sync/1 on platforms where the datasync function is not available.
> * Use file:pwrite/2 to batch together multiple outstanding write requests.
> This is essentially Paul's zip_server [2]. In order to take full advantage
> of it we need to patch couch_btree to update nodes in parallel. Currently
> there should only be 1 outstanding write request in a couch_file at a time,
> so it wouldn't help at all.
> * Open the file in append mode and stop seeking to eof in user space. We
> never modify files (aside from truncating, which is rare enough to be handled
> separately), so perhaps it would help with performance if we let the kernel
> deal with the seek. We'd still need a way to get the file size for the
> make_blocks function. I'm wondering if file:read_file_info(Fd) is more
> efficient than file:position(Fd, eof) for this purpose.
> A caveat - I'm not sure if append-only files are compatible with the previous
> enhancement. There is no file:write/2, and I have no idea how file:pwrite
> behaves on a file which is opened append-only. Is the Pos ignored, or is it
> an error? Will have to test.
> * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antirez'
> recent blog post [3] and some historical discussions on pgsql-performance.
> Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux,
> which is currently the same thing) and doing all synchronous writes is
> reasonably fast. Antirez' tests showed 250 µs delays for (tiny) synchronous
> writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system.
> At the very least, this looks to be a compelling choice for file access when
> the server is running with delayed_commits = true. We'd need to patch the
> OTP file driver again, and also investigate the cross-platform support. In
> particular, I don't think it works on NFS.
> [1]: http://github.com/fdmanana/otp/tree/fdatasync
> [2]: http://github.com/davisp/zip_server
> [3]: http://antirez.com/post/fsync-different-thread-useless.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.