[ 
https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152222#comment-13152222
 ] 

Damien Katz commented on COUCHDB-1342:
--------------------------------------

I don't mean to imply that Paul, or any committer isn't smart enough to handle 
a flush call. I _know_ Paul is has the smarts and talent to deal with much more 
complexity. What I am saying is that if a flush call requirement makes it so 
that someone can't work on the internals of CouchDB, then they aren't suited 
for core database development. Database engines are complex beasts.

Paul's point is about that the flush call can maybe be gotten rid of seems 
right. Originally, we didn't have the code that prevented the write queue 
getting overwhelmed, because in our product it's not possible. But I added it 
to make the rest of the enhancements suitable for Apache, and now it seems it 
could be used to prevent the reads of unflushed data. However, there is another 
optimization coming where a raw erlang FD is used in a calling process to avoid 
messaging overhead (another big performance improvement in certain long 
operations), which will maybe make it necessary again. We can remove it in the 
meantime, but it may need to be added back in the future.

The concern with doubling the # non-db file descriptors is a real one. How big 
of a concern of this? Do you have ideas how to fix this? Can we address this 
post check-in?

Your 3rd and 4th concerns aren't Apache user concerns, but can be easily 
addressed after check-in. I have no objections, but I would prefer we have a 
culture of small changes/environment specific changes like that happening after 
checkin. That will increase the rate the of progress on the project in general. 
If you agree, would you be willing to add those changes post check-in?

The 5th concern would definitely make code more complicated for callers, and 
would involved them batching usually a non-optimal amount of data. This code 
makes the batching automatic and parallelize the writes, retiring batched data 
as fast as it can, and prevented the batching of too much data.
                
> Asynchronous file writes
> ------------------------
>
>                 Key: COUCHDB-1342
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1342
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Jan Lehnardt
>             Fix For: 1.3
>
>         Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit: 
> https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to