Ok, I've got a little further. If I change my test to much short runs (even 10 documents), I can reproduce the connection refused symptom and the stacktrace I pasted originally in under a minute, every time.
What appears to be happening is that the couch_uuids gen_server is failing (being restarted too frequently), part of the supervision tree is torn down and rebuilt, and a concurrent write operation fails while that is happening. Since I'm pretty sure that's not what should happen with Erlang/OTP, it's hopefully a straightforward bug. Alas, my test client is in Java (using httpclient 4.0, fwiw), so I can't easily post a unit test for this right now. B. On Sat, Oct 3, 2009 at 1:52 PM, Robert Newson <[email protected]> wrote: > A subsequent run that encountered the connection refused error did not > cause the couch_uuids supervisor to restart it, so the two problems > are unrelated. > > On Sat, Oct 3, 2009 at 1:50 PM, Robert Newson <[email protected]> wrote: >> Hi, >> >> Jan suggested I start a thread on dev about a problem I'm encountering >> on couchdb trunk. I'm performing long running insertion tests (that >> is, millions of inserts) in order to quantify the differences between >> batch vs. sync and random identifiers vs. sequential ones. I find it >> hard to complete a 5 million insertion run as my client eventually >> (and randomly) gets a "connection refused" error from couchdb. >> Immediately after that occurs, I can successfully hit couchdb with >> curl, so it's transitory. I found the following errors in the log >> around the time of the problem; >> >> =SUPERVISOR REPORT==== 3-Oct-2009::13:32:18 === >> Supervisor: {local,couch_secondary_services} >> Context: shutdown >> Reason: reached_max_restart_intensity >> Offender: [{pid,<0.5273.0>}, >> {name,uuids}, >> {mfa,{couch_uuids,start,[]}}, >> {restart_type,permanent}, >> {shutdown,brutal_kill}, >> {child_type,worker}] >> >> [error] [<0.76.0>] {error_report,<0.30.0>, >> {<0.76.0>,supervisor_report, >> [{supervisor,{local,couch_server_sup}}, >> {errorContext,child_terminated}, >> {reason,shutdown}, >> {offender, >> [{pid,<0.2218.0>}, >> {name,couch_secondary_services}, >> {mfa,{couch_server_sup,start_secondary_services,[]}}, >> {restart_type,permanent}, >> {shutdown,infinity}, >> {child_type,supervisor}]}]}} >> >> =SUPERVISOR REPORT==== 3-Oct-2009::13:32:18 === >> Supervisor: {local,couch_server_sup} >> Context: child_terminated >> Reason: shutdown >> Offender: [{pid,<0.2218.0>}, >> {name,couch_secondary_services}, >> {mfa,{couch_server_sup,start_secondary_services,[]}}, >> {restart_type,permanent}, >> {shutdown,infinity}, >> {child_type,supervisor}] >> >> >> =ERROR REPORT==== 3-Oct-2009::13:32:18 === >> Error in process <0.5316.0> with exit value: >> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_databases},-1}]},{couch_stats_collector,decrement,1}]} >> >> >> =ERROR REPORT==== 3-Oct-2009::13:32:18 === >> Error in process <0.5312.0> with exit value: >> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement,1}]} >> >
