[jira] [Commented] (COUCHDB-1342) Asynchronous file writes

Paul Joseph Davis (Commented) (JIRA) Thu, 17 Nov 2011 16:40:15 -0800

    [ 
https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152507#comment-13152507
 ]


Paul Joseph Davis commented on COUCHDB-1342:
--------------------------------------------


@Damien

> However, there is another optimization coming where a raw erlang FD is used 
> in a calling process to avoid messaging overhead (another big performance 
> improvement in certain long operations), which will maybe make it necessary 
> again. 

I'm not sure what you mean here. Something along the lines of a file:open call 
in the couch_db_updater process (and couch_mrview_updater)? If so that's an 
interesting idea. Seems like we could make couch_file handle that quite easily 
along the lines of how file handles #prim_file vs #file (if I recall those 
record names correctly). This could also solve some of the fd duplication if we 
only need an extra fd for views that are updating.

> The concern with doubling the # non-db file descriptors is a real one. How 
> big of a concern of this?

The thing is, I'm not certain how it'll behave. Hence why it concerns me. Is it 
a matter of just making sure that ulimit is set sufficiently high? How high is 
sufficient? If I'm running in production, and I upgrade to a version of CouchDB 
that has this patch, can I at least guestimate how configs might need to 
change? Maybe I'm being overly paranoid and its not an issue. I dunno. Hence 
why it concerns me.

> The 5th concern would definitely make code more complicated for callers

I agree. I should've prefaced that bit with a "I wonder if in the future 
there's a follow up direction we can go". It only occurred as I was finishing 
that comment so I figured I'd write it down.

> Your 3rd and 4th concerns aren't Apache user concerns, but can be easily 
> addressed after check-in.

I have no idea what you mean by "Apache user concerns" here. If you're 
referring to "no one cares how the sausage is made so long as its faster" then 
I'm going to have to disagree. Strongly. Saying that databases are complicated 
so we shouldn't concern ourselves with code quality is just going to leave us 
with a source tree in an even worse state than it already is.

And I'd like to address this argument about progress and the desires of users. 
This patch was submitted to JIRA yesterday. My initial review was up within 
3.5h. This patch changes how the file abstraction works. In a database. As far 
as I'm concerned development on this started yesterday at 13:28 when Jan 
uploaded the patch to JIRA. If you wanted things to be moving more quickly at 
this stage you should have been developing this on a branch in git and asking 
for input from the community.

Secondly, while I understand that you're highly motivated to help users by 
improving performance, what does that have to do with the conversation about 
the technical merits of this patch? This sense of urgency that progress must be 
made so lets address the issues I brought up after its in trunk is not a 
convincing argument. You could address my comments by spending thirty minutes 
in an editor and resubmitting the patch. Instead you're asking me to clean this 
up for you after its committed.

Thirdly, every time someone asks, "Can it wait till it's on trunk?", all I hear 
is, "Can I ignore what you just said and commit this anyway?" If I point at 
something and say that its broken its because I'm expecting the patch to change 
or an explanation of why I'm wrong. And I'm fine being wrong. It happens quite 
often. But this pattern of submitting patches and asking for all concerns to be 
addressed after the patch is in trunk is starting to get a bit annoying. If we 
want to adjust our policies around CTR vs RTC for larger patches, that's fine. 
Perhaps adding an edge branch in git that will accept all our bigger somewhat 
scary commits would be beneficial. If we start doing automated package building 
then users could even pull bleeding edge code to test. But I digress.

                
> Asynchronous file writes
> ------------------------
>
>                 Key: COUCHDB-1342
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1342
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Jan Lehnardt
>             Fix For: 1.3
>
>         Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit: 
> https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COUCHDB-1342) Asynchronous file writes

Reply via email to