This patch seems to work well, the uuids Futon test is also more stable now. You could run it repeatedly and see random fails before. -- Bob


On Oct 3, 2009, at 10:35 AM, Robert Newson wrote:

This is now COUCHDB-517 and includes a patch that fixes the problem.

B.

On Sat, Oct 3, 2009 at 2:34 PM, Robert Newson <[email protected]> wrote:
I should point out that my test does this;

1) PUT _config/uuid/algorithm with "random"
2) insert some documents
3) PUT _config/uuid/algorithm with "sequential"
4) insert some documents

If you loop that, and insert as few as 10 documents at 2) and 4), you
will get a connection refused and the stacktrace output, within 60
seconds.


On Sat, Oct 3, 2009 at 2:33 PM, Robert Newson <[email protected] > wrote:
Ok, I've got a little further. If I change my test to much short runs
(even 10 documents), I can reproduce the connection refused symptom
and the stacktrace I pasted originally in under a minute, every time.

What appears to be happening is that the couch_uuids gen_server is
failing (being restarted too frequently), part of the supervision tree is torn down and rebuilt, and a concurrent write operation fails while that is happening. Since I'm pretty sure that's not what should happen
with Erlang/OTP, it's hopefully a straightforward bug.

Alas, my test client is in Java (using httpclient 4.0, fwiw), so I
can't easily post a unit test for this right now.

B.

On Sat, Oct 3, 2009 at 1:52 PM, Robert Newson <[email protected] > wrote:
A subsequent run that encountered the connection refused error did not
cause the couch_uuids supervisor to restart it, so the two problems
are unrelated.

On Sat, Oct 3, 2009 at 1:50 PM, Robert Newson <[email protected] > wrote:
Hi,

Jan suggested I start a thread on dev about a problem I'm encountering on couchdb trunk. I'm performing long running insertion tests (that is, millions of inserts) in order to quantify the differences between batch vs. sync and random identifiers vs. sequential ones. I find it
hard to complete a 5 million insertion run as my client eventually
(and randomly) gets a "connection refused" error from couchdb.
Immediately after that occurs, I can successfully hit couchdb with
curl, so it's transitory. I found the following errors in the log
around the time of the problem;

=SUPERVISOR REPORT==== 3-Oct-2009::13:32:18 ===
    Supervisor: {local,couch_secondary_services}
    Context:    shutdown
    Reason:     reached_max_restart_intensity
    Offender:   [{pid,<0.5273.0>},
                 {name,uuids},
                 {mfa,{couch_uuids,start,[]}},
                 {restart_type,permanent},
                 {shutdown,brutal_kill},
                 {child_type,worker}]

[error] [<0.76.0>] {error_report,<0.30.0>,
   {<0.76.0>,supervisor_report,
    [{supervisor,{local,couch_server_sup}},
     {errorContext,child_terminated},
     {reason,shutdown},
     {offender,
         [{pid,<0.2218.0>},
          {name,couch_secondary_services},
          {mfa,{couch_server_sup,start_secondary_services,[]}},
          {restart_type,permanent},
          {shutdown,infinity},
          {child_type,supervisor}]}]}}

=SUPERVISOR REPORT==== 3-Oct-2009::13:32:18 ===
    Supervisor: {local,couch_server_sup}
    Context:    child_terminated
    Reason:     shutdown
    Offender:   [{pid,<0.2218.0>},
                 {name,couch_secondary_services},
{mfa,{couch_server_sup,start_secondary_services, []}},
                 {restart_type,permanent},
                 {shutdown,infinity},
                 {child_type,supervisor}]


=ERROR REPORT==== 3-Oct-2009::13:32:18 ===
Error in process <0.5316.0> with exit value:
{badarg,[{ets,insert,[stats_hit_table, {{couchdb,open_databases},-1}]},{couch_stats_collector,decrement, 1}]}


=ERROR REPORT==== 3-Oct-2009::13:32:18 ===
Error in process <0.5312.0> with exit value:
{badarg,[{ets,insert,[stats_hit_table, {{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement, 1}]}





Reply via email to