[
https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151723#comment-13151723
]
Paul Joseph Davis commented on COUCHDB-1342:
--------------------------------------------
@Damien
That's an awful lot of disappointment packed into a single comment. First,
resorting to an ad hominem attack to insinuate that I'm not intelligent enough
to work on databases is quite disconcerting. Secondly, its an egregious fallacy
to suggest that because a patch appears to be technically correct that it
should be committed. Thirdly, declaring what is and isn't a valid reason to
hold up a patch is not how the ASF works.
And now back to the regularly scheduled technical discussion.
First, couch_file:flush/1. Unless I'm missing something extremely subtle here,
it's existence is so that clients can read their own writes. Yet the couch_file
gen_server has all the knowledge it needs to know if it has to flush to service
a write call. If the requested read position is between #file.eof and #file.eof
+ #file.queued_write_bytes, then it can call flush and move on with its life.
Not only does this mean that clients don't have remember to call flush, but it
removes unnecessary message passing that every unconditional call to flush
would generate.
Second, this is doubling the number of file descriptors required for anything
that isn't a database. On the first production machine I checked that's an
increase of 75% from 40K to 70K file descriptors. That's a fairly serious
change that ought to be discussed. At the very least it ought to be mentioned
somewhere so ops teams know to expect it.
Third, this is spawning long lived processes that aren't looping on exported
functions. After two code upgrades this would crash every couch_file in the VM
simultaneously.
Fourth, as I've mentioned numerous times before, the proper way to
synchronously start a process that might fail to initialize is to use
proc_lib:start_link and proc_lib:init_ack.
Fifth, has anyone considered using a write buffer outside of the couch_file API
that would allow clients more precise control. For instance, thinking briefly
on the view updater, you could buffer writes for a single add_remove call. This
also leads to the possibility that mostly read views aren't needlessly holding
open a writer fd for no reason.
> Asynchronous file writes
> ------------------------
>
> Key: COUCHDB-1342
> URL: https://issues.apache.org/jira/browse/COUCHDB-1342
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Reporter: Jan Lehnardt
> Fix For: 1.3
>
> Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit:
> https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira