[jira] Commented: (COUCHDB-754) Investigate alternative couch_file writer implementations

Adam Kocoloski (JIRA) Mon, 03 May 2010 17:36:21 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863581#action_12863581
 ]


Adam Kocoloski commented on COUCHDB-754:
----------------------------------------

Here's a simple Erlang microbenchmark:

-module(appending_microbench).
-compile(export_all).

test(BinSize, N) ->
    crypto:start(),
    Data = [crypto:rand_bytes(BinSize) || _I <- lists:seq(1,N)],
    {ok, File} = couch_file:open("foo.couch", [create, overwrite]),
    T0 = now(),
    [couch_file:append_binary(File, Bin) || Bin <- Data],
    T1 = now(),
    timer:now_diff(T1, T0).

If I do appending_microbench:test(4096, 1024) with tr...@940505 I can write 4MB 
of random junk in 47.2 ms (best of 5 tries).  With the 
cheaper-appending-v2.patch that number drops to 37.0 ms (again, best of 5).  
The measurement variance was small in both cases.

Interestingly, there's a huge difference in performance right around the 8KB 
mark.  These numbers are very reproducible on my laptop:

2> appending_microbench:test(8185, 1024).
43208
3> appending_microbench:test(8186, 1024).
215523

I don't know why it takes nearly 5x the time to write out binaries which are 1 
byte longer.  Very weird.  I see this with trunk and with the patch.

> Investigate alternative couch_file writer implementations
> ---------------------------------------------------------
>
>                 Key: COUCHDB-754
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-754
>             Project: CouchDB
>          Issue Type: Improvement
>         Environment: some code might be platform-specific
>            Reporter: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: cheaper-appending-v2.patch, cheaper-appending.patch
>
>
> I've got a number of possible enhancements to couch_file floating around in 
> my head, wanted to write them down.
> * Use fdatasync instead of fsync.  Filipe posted a patch to the OTP file 
> driver [1] that adds a new file:datasync/1 function.  I suspect that we won't 
> see much of a performance gain from this switch because we append to the file 
> and thus need to update the file metedata anyway.  On the other hand, I'm 
> fairly certain fdatasync is always safe for our needs, so if it is ever more 
> efficient we should use it.  Obviously, we'll need to fall back to 
> file:sync/1 on platforms where the datasync function is not available.
> * Use file:pwrite/2 to batch together multiple outstanding write requests.  
> This is essentially Paul's zip_server [2].  In order to take full advantage 
> of it we need to patch couch_btree to update nodes in parallel.  Currently 
> there should only be 1 outstanding write request in a couch_file at a time, 
> so it wouldn't help at all.
> * Open the file in append mode and stop seeking to eof in user space.  We 
> never modify files (aside from truncating, which is rare enough to be handled 
> separately), so perhaps it would help with performance if we let the kernel 
> deal with the seek.  We'd still need a way to get the file size for the 
> make_blocks function.  I'm wondering if file:read_file_info(Fd) is more 
> efficient than file:position(Fd, eof) for this purpose.
> A caveat - I'm not sure if append-only files are compatible with the previous 
> enhancement.  There is no file:write/2, and I have no idea how file:pwrite 
> behaves on a file which is opened append-only.  Is the Pos ignored, or is it 
> an error?  Will have to test.
> * Use O_DSYNC instead of fsync/fdatasync.  This one is inspired by antirez' 
> recent blog post [3] and some historical discussions on pgsql-performance.  
> Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux, 
> which is currently the same thing) and doing all synchronous writes is 
> reasonably fast.  Antirez' tests showed 250 µs delays for (tiny) synchronous 
> writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system.
> At the very least, this looks to be a compelling choice for file access when 
> the server is running with delayed_commits = true.  We'd need to patch the 
> OTP file driver again, and also investigate the cross-platform support.  In 
> particular, I don't think it works on NFS.
> [1]: http://github.com/fdmanana/otp/tree/fdatasync
> [2]: http://github.com/davisp/zip_server
> [3]: http://antirez.com/post/fsync-different-thread-useless.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-754) Investigate alternative couch_file writer implementations

Reply via email to