[ 
https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151430#comment-13151430
 ] 

Paul Joseph Davis commented on COUCHDB-1342:
--------------------------------------------


> These are deliberate. Currently we are maintaining an Fd/couch_file pair for 
> reads and writes in a bit of a clumsy way in the #btree.fd record and the 
> couch_file module and do a switcheroo for writes vs. reads. Now all of that 
> is moved into couch_file and couch_db doesn't have to concern itself with the 
> details of efficient file access.

Oh I hadn't noticed that weirdness before, but I see it now.

> Good point, API-wise this is a bit leaky and we should definitely look into 
> making this better, but I don't think this blocks the bulk of the 
> improvements that this patch introduces. I'm happy to open a new ticket about 
> this. Good starting points are calling flush() inside couch_file 1) after 
> writing a header and 2) if a read fails with {error, eof} (fwiw, Damien tried 
> the latter before but removed it again, the details elude me). 

I disagree fairly strongly on this point. couch_file is about as core of an API 
as it gets. couch_file:flush/1 shouldn't even exist as far as I'm concerned. 
This isn't a mediocre API that should be improved later, its a bad API that 
shouldn't go in at all. Its 100% foot-gun.

> The difference is that delayed_write does buffering whereas we just want to 
> keep writing to the file all the time. delayed_write would not write until 
> the buffer is full or the timeout is hit. In addition, we wouldn't be able to 
> tell for delayed_commits=false writes when data hit the file reliably. 

Oh right, I remember that discussion about clearing the buffer being an issue 
now. Good call.

> The whole point of this patch is to make couch_file smarter and move the 
> reader/writer abstraction further down the stack and reap the associated 
> performance benefits. Unless we have an alternate patch, we can't really 
> compare this. 

Code Review 2.0:

  Q: "This looks complicated, can we think about trying to simplify some of the 
organization?"
  A: "No. Write it yourself if you care that much."

I don't need another patch to know that this one is complicated and could be 
less complicated.

> This doesn't change the error scenarios. We already rely on monitoring to 
> tell the request process a couch_file write didn't work

You're pointing at code that's about three abstractions away from couch_file's 
writer loop. Suffice it to say, erlang:monitor/2 on a random process in the 
write path does little to assuage my fears that we're entering dangerous 
territory relying on our own file writing buffers.

                
> Asynchronous file writes
> ------------------------
>
>                 Key: COUCHDB-1342
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1342
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Jan Lehnardt
>             Fix For: 1.3
>
>         Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit: 
> https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to