[
https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152525#comment-13152525
]
Paul Joseph Davis commented on COUCHDB-1342:
--------------------------------------------
@Damien
> Robert, the inflight batching of writes is limited to 1 meg per database.
No, its up to 1 meg per file that's being written to. It's also important to
note that the buffering isn't actually a passive thing like is generally done.
The "buffer" is actually just the mailbox for the writer_loop process. The
queued_bytes_len or whatever is just counting how much data has been submitted
to to the process that hasn't been acked to prevent blowing the top of that
mailbox (which is quite reasonable).
The writer isn't really buffering anything itself, its just leaning on Erlang's
message passing internals to be that buffer (which is quite reasonable). Then
all the writer_loop does is accept messages and respond to the parent
couch_file gen_server. If it happens to find multiple write messages in the
mailbox consecutively at the same time, it'll write those in a single call to
file:write/2.
I would not be at all surprised if it were shown that the bulk of the
improvement from this patch is due to this specific part of the patch. For the
curious, the zip_server test at [1] tests something quite similar to this setup.
[1] https://github.com/davisp/zip_server
> Asynchronous file writes
> ------------------------
>
> Key: COUCHDB-1342
> URL: https://issues.apache.org/jira/browse/COUCHDB-1342
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Reporter: Jan Lehnardt
> Fix For: 1.3
>
> Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit:
> https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira