On Sun, Nov 7, 2010 at 8:09 PM, Adam Kocoloski <[email protected]> wrote: > On Nov 7, 2010, at 2:52 PM, Filipe David Manana wrote: > >> On Sun, Nov 7, 2010 at 7:20 PM, Adam Kocoloski <[email protected]> wrote: >>> On Nov 7, 2010, at 11:35 AM, Filipe David Manana wrote: >>> >>>> Also, with this patch I verified (on Solaris, with the 'zpool iostat >>>> 1' command) that when running a writes only test with relaximation >>>> (200 write processes), disk write activity is not continuous. Without >>>> this patch, there's continuous (every 1 second) write activity. >>> >>> I'm confused by this statement. You must be talking about relaximation runs >>> with delayed_commits = true, right? Why do you think you see larger >>> intervals between write activity with the optimization from COUCHDB-767? >>> Have you measured the time it takes to open the extra FD? In my tests that >>> was a sub-millisecond operation, but maybe you've uncovered something else. >> >> No, it happens for tests with delayed_commits = false. The only >> possible explanation I see for the variance might be related to the >> Erlang VM scheduler decisions about when to start/run that process. >> Nevertheless, I dont know the exact cause, but the fsync run frequency >> varies a lot. > > I think it's worth investigating. I couldn't reproduce it on my plain-old > spinning disk MacBook with 200 writers in relaximation; the IOPS reported by > iostat stayed very uniform. > >>>> For the goal of not having readers getting blocked by fsync calls (and >>>> write calls), I would propose using a separate couch_file process just >>>> for read operations. I have a branch in my github for this (with >>>> COUCHDB-767 reverted). It needs to be polished, but the relaximation >>>> tests are very positive, both reads and writes get better response >>>> times and throughput: >>>> >>>> https://github.com/fdmanana/couchdb/tree/2_couch_files_no_batch_reads >>> >>> I'd like to propose an alternative optimization, which is to keep a >>> dedicated file descriptor open in the couch_db_updater process and use that >>> file descriptor for _all_ IO initiated by the db_updater. The advantage is >>> that the db_updater does not need to do any message passing for disk IO, >>> and thus does not slow down when the incoming message queue is large. A >>> message queue much much larger than the number of concurrent writers can >>> occur if a user writes with batch=ok, and it can also happen rather easily >>> in a BigCouch cluster. >> >> I don't see how that will improve things, since all write operations >> will still be done in a serialized manner. Since only couch_db_updater >> writes to the DB file, and since access to the couch_db_updater is >> serialized, to me it only seems that you're solution avoids one level >> of indirection (the couch_file process). I don't see how, when using a >> couch_file only for writes, you get the message queue for that >> couc_file process full of write messages. > > It's the db_updater which gets a large message queue, not the couch_file. > The db_updater ends up with a big backlog of update_docs messages that get in > the way when it needs to make gen_server calls to the couch_file process for > IO. It's a significant problem in R13B, probably less so in R14B because of > some cool optimizations by the OTP team.
So, let me see if I get it. The couch_db_updater process is slow picking the results of the calls to the couch_file process because its mailbox is full of update_docs messages? > >> Also, what I did on that branch is a bit more generic, as it works for >> view index files as well, and doesn't introduce significant changes >> elsewhere except in couch_file.erl. Of course your solution might be >> extended to the view updater process as well easily, I don't have >> anything against it. >> >> Anyway, +1. > > I do like that the work you did applies immediately to the view group files. > Applying what I'm proposing to the view updater would probably be easy, but > not "zero lines changed" easy. On the other hand, the problem I'm trying to > avoid is a non-issue with views, since they're never updated directly by > clients. Best, > > Adam > > -- Filipe David Manana, [email protected], [email protected] "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
