This is now COUCHDB-517 and includes a patch that fixes the problem. B.
On Sat, Oct 3, 2009 at 2:34 PM, Robert Newson <[email protected]> wrote: > I should point out that my test does this; > > 1) PUT _config/uuid/algorithm with "random" > 2) insert some documents > 3) PUT _config/uuid/algorithm with "sequential" > 4) insert some documents > > If you loop that, and insert as few as 10 documents at 2) and 4), you > will get a connection refused and the stacktrace output, within 60 > seconds. > > > On Sat, Oct 3, 2009 at 2:33 PM, Robert Newson <[email protected]> wrote: >> Ok, I've got a little further. If I change my test to much short runs >> (even 10 documents), I can reproduce the connection refused symptom >> and the stacktrace I pasted originally in under a minute, every time. >> >> What appears to be happening is that the couch_uuids gen_server is >> failing (being restarted too frequently), part of the supervision tree >> is torn down and rebuilt, and a concurrent write operation fails while >> that is happening. Since I'm pretty sure that's not what should happen >> with Erlang/OTP, it's hopefully a straightforward bug. >> >> Alas, my test client is in Java (using httpclient 4.0, fwiw), so I >> can't easily post a unit test for this right now. >> >> B. >> >> On Sat, Oct 3, 2009 at 1:52 PM, Robert Newson <[email protected]> >> wrote: >>> A subsequent run that encountered the connection refused error did not >>> cause the couch_uuids supervisor to restart it, so the two problems >>> are unrelated. >>> >>> On Sat, Oct 3, 2009 at 1:50 PM, Robert Newson <[email protected]> >>> wrote: >>>> Hi, >>>> >>>> Jan suggested I start a thread on dev about a problem I'm encountering >>>> on couchdb trunk. I'm performing long running insertion tests (that >>>> is, millions of inserts) in order to quantify the differences between >>>> batch vs. sync and random identifiers vs. sequential ones. I find it >>>> hard to complete a 5 million insertion run as my client eventually >>>> (and randomly) gets a "connection refused" error from couchdb. >>>> Immediately after that occurs, I can successfully hit couchdb with >>>> curl, so it's transitory. I found the following errors in the log >>>> around the time of the problem; >>>> >>>> =SUPERVISOR REPORT==== 3-Oct-2009::13:32:18 === >>>> Supervisor: {local,couch_secondary_services} >>>> Context: shutdown >>>> Reason: reached_max_restart_intensity >>>> Offender: [{pid,<0.5273.0>}, >>>> {name,uuids}, >>>> {mfa,{couch_uuids,start,[]}}, >>>> {restart_type,permanent}, >>>> {shutdown,brutal_kill}, >>>> {child_type,worker}] >>>> >>>> [error] [<0.76.0>] {error_report,<0.30.0>, >>>> {<0.76.0>,supervisor_report, >>>> [{supervisor,{local,couch_server_sup}}, >>>> {errorContext,child_terminated}, >>>> {reason,shutdown}, >>>> {offender, >>>> [{pid,<0.2218.0>}, >>>> {name,couch_secondary_services}, >>>> {mfa,{couch_server_sup,start_secondary_services,[]}}, >>>> {restart_type,permanent}, >>>> {shutdown,infinity}, >>>> {child_type,supervisor}]}]}} >>>> >>>> =SUPERVISOR REPORT==== 3-Oct-2009::13:32:18 === >>>> Supervisor: {local,couch_server_sup} >>>> Context: child_terminated >>>> Reason: shutdown >>>> Offender: [{pid,<0.2218.0>}, >>>> {name,couch_secondary_services}, >>>> {mfa,{couch_server_sup,start_secondary_services,[]}}, >>>> {restart_type,permanent}, >>>> {shutdown,infinity}, >>>> {child_type,supervisor}] >>>> >>>> >>>> =ERROR REPORT==== 3-Oct-2009::13:32:18 === >>>> Error in process <0.5316.0> with exit value: >>>> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_databases},-1}]},{couch_stats_collector,decrement,1}]} >>>> >>>> >>>> =ERROR REPORT==== 3-Oct-2009::13:32:18 === >>>> Error in process <0.5312.0> with exit value: >>>> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement,1}]} >>>> >>> >> >
