Re: About possibly reverting COUCHDB-767

Filipe David Manana Sun, 07 Nov 2010 12:30:12 -0800

On Sun, Nov 7, 2010 at 8:09 PM, Adam Kocoloski <[email protected]> wrote:
> On Nov 7, 2010, at 2:52 PM, Filipe David Manana wrote:
>
>> On Sun, Nov 7, 2010 at 7:20 PM, Adam Kocoloski <[email protected]> wrote:
>>> On Nov 7, 2010, at 11:35 AM, Filipe David Manana wrote:
>>>
>>>> Also, with this patch I verified (on Solaris, with the 'zpool iostat
>>>> 1' command) that when running a writes only test with relaximation
>>>> (200 write processes), disk write activity is not continuous. Without
>>>> this patch, there's continuous (every 1 second) write activity.
>>>
>>> I'm confused by this statement. You must be talking about relaximation runs 
>>> with delayed_commits = true, right?  Why do you think you see larger 
>>> intervals between write activity with the optimization from COUCHDB-767?  
>>> Have you measured the time it takes to open the extra FD?  In my tests that 
>>> was a sub-millisecond operation, but maybe you've uncovered something else.
>>
>> No, it happens for tests with delayed_commits = false. The only
>> possible explanation I see for the variance might be related to the
>> Erlang VM scheduler decisions about when to start/run that process.
>> Nevertheless, I dont know the exact cause, but the fsync run frequency
>> varies a lot.
>
> I think it's worth investigating.  I couldn't reproduce it on my plain-old 
> spinning disk MacBook with 200 writers in relaximation; the IOPS reported by 
> iostat stayed very uniform.
>
>>>> For the goal of not having readers getting blocked by fsync calls (and
>>>> write calls), I would propose using a separate couch_file process just
>>>> for read operations. I have a branch in my github for this (with
>>>> COUCHDB-767 reverted). It needs to be polished, but the relaximation
>>>> tests are very positive, both reads and writes get better response
>>>> times and throughput:
>>>>
>>>> https://github.com/fdmanana/couchdb/tree/2_couch_files_no_batch_reads
>>>
>>> I'd like to propose an alternative optimization, which is to keep a 
>>> dedicated file descriptor open in the couch_db_updater process and use that 
>>> file descriptor for _all_ IO initiated by the db_updater.  The advantage is 
>>> that the db_updater does not need to do any message passing for disk IO, 
>>> and thus does not slow down when the incoming message queue is large.  A 
>>> message queue much much larger than the number of concurrent writers can 
>>> occur if a user writes with batch=ok, and it can also happen rather easily 
>>> in a BigCouch cluster.
>>
>> I don't see how that will improve things, since all write operations
>> will still be done in a serialized manner. Since only couch_db_updater
>> writes to the DB file, and since access to the couch_db_updater is
>> serialized, to me it only seems that you're solution avoids one level
>> of indirection (the couch_file process). I don't see how, when using a
>> couch_file only for writes, you get the message queue for that
>> couc_file process full of write messages.
>
> It's the db_updater which gets a large message queue, not the couch_file.  
> The db_updater ends up with a big backlog of update_docs messages that get in 
> the way when it needs to make gen_server calls to the couch_file process for 
> IO.  It's a significant problem in R13B, probably less so in R14B because of 
> some cool optimizations by the OTP team.


So, let me see if I get it. The couch_db_updater process is slow
picking the results of the calls to the couch_file process because its
mailbox is full of update_docs messages?

>
>> Also, what I did on that branch is a bit more generic, as it works for
>> view index files as well, and doesn't introduce significant changes
>> elsewhere except in couch_file.erl. Of course your solution might be
>> extended to the view updater process as well easily, I don't have
>> anything against it.
>>
>> Anyway, +1.
>
> I do like that the work you did applies immediately to the view group files.  
> Applying what I'm proposing to the view updater would probably be easy, but 
> not "zero lines changed" easy.  On the other hand, the problem I'm trying to 
> avoid is a non-issue with views, since they're never updated directly by 
> clients.  Best,
>
> Adam
>
>



-- 
Filipe David Manana,
[email protected], [email protected]

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Re: About possibly reverting COUCHDB-767

Reply via email to