[
https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152222#comment-13152222
]
Damien Katz commented on COUCHDB-1342:
--------------------------------------
I don't mean to imply that Paul, or any committer isn't smart enough to handle
a flush call. I _know_ Paul is has the smarts and talent to deal with much more
complexity. What I am saying is that if a flush call requirement makes it so
that someone can't work on the internals of CouchDB, then they aren't suited
for core database development. Database engines are complex beasts.
Paul's point is about that the flush call can maybe be gotten rid of seems
right. Originally, we didn't have the code that prevented the write queue
getting overwhelmed, because in our product it's not possible. But I added it
to make the rest of the enhancements suitable for Apache, and now it seems it
could be used to prevent the reads of unflushed data. However, there is another
optimization coming where a raw erlang FD is used in a calling process to avoid
messaging overhead (another big performance improvement in certain long
operations), which will maybe make it necessary again. We can remove it in the
meantime, but it may need to be added back in the future.
The concern with doubling the # non-db file descriptors is a real one. How big
of a concern of this? Do you have ideas how to fix this? Can we address this
post check-in?
Your 3rd and 4th concerns aren't Apache user concerns, but can be easily
addressed after check-in. I have no objections, but I would prefer we have a
culture of small changes/environment specific changes like that happening after
checkin. That will increase the rate the of progress on the project in general.
If you agree, would you be willing to add those changes post check-in?
The 5th concern would definitely make code more complicated for callers, and
would involved them batching usually a non-optimal amount of data. This code
makes the batching automatic and parallelize the writes, retiring batched data
as fast as it can, and prevented the batching of too much data.
> Asynchronous file writes
> ------------------------
>
> Key: COUCHDB-1342
> URL: https://issues.apache.org/jira/browse/COUCHDB-1342
> Project: CouchDB
> Issue Type: Improvement
> Components: Database Core
> Reporter: Jan Lehnardt
> Fix For: 1.3
>
> Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit:
> https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira