Yep, that seems reasonable. Let me know when I can test again. :)
On Mon, Jan 23, 2012 at 1:34 AM, Filipe David Manana <[email protected]>wrote:
> Noah, was able to reproduce your issue by tweaking the test to create
> more leaf revisions for a document:
>
> diff --git a/test/etap/242-replication-many-leaves.t
> b/test/etap/242-replication-many-leaves.t
> index d8d3eb9..4eb4765 100755
> --- a/test/etap/242-replication-many-leaves.t
> +++ b/test/etap/242-replication-many-leaves.t
> @@ -56,7 +56,7 @@ doc_ids() ->
>
> doc_num_conflicts(<<"doc1">>) -> 10;
> doc_num_conflicts(<<"doc2">>) -> 100;
> -doc_num_conflicts(<<"doc3">>) -> 286.
> +doc_num_conflicts(<<"doc3">>) -> 500.
>
>
> main(_) ->
>
>
> With that change, I get exactly the same timeout as you get when the
> test runs the push replication. It turns out that some _bulk_docs
> requests are taking more than 30 seconds (default replication
> connection timeout) therefore the replication request retry messages.
> Verified this by timing the _bulk_docs handler to log the time it
> takes:
>
> diff --git a/src/couchdb/couch_httpd_db.erl
> b/src/couchdb/couch_httpd_db.erl
> index d7ecb4a..442571d 100644
> --- a/src/couchdb/couch_httpd_db.erl
> +++ b/src/couchdb/couch_httpd_db.erl
> @@ -297,6 +297,7 @@
> db_req(#httpd{path_parts=[_,<<"_ensure_full_commit">>]}=Req, _Db) ->
> send_method_not_allowed(Req, "POST");
>
> db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db) ->
> + T0 = now(),
> couch_stats_collector:increment({httpd, bulk_requests}),
> couch_httpd:validate_ctype(Req, "application/json"),
> {JsonProps} = couch_httpd:json_body_obj(Req),
> @@ -357,7 +358,9 @@
> db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db)
> ->
> {ok, Errors} = couch_db:update_docs(Db, Docs, Options,
> replicated_changes),
> ErrorsJson =
> lists:map(fun update_doc_result_to_json/1, Errors),
> - send_json(Req, 201, ErrorsJson)
> + Rr = send_json(Req, 201, ErrorsJson),
> + ?LOG_ERROR("BULK DOCS took ~p ms~n",
> [timer:now_diff(now(), T0) / 1000]),
> + Rr
> end
> end;
> db_req(#httpd{path_parts=[_,<<"_bulk_docs">>]}=Req, _Db) ->
>
>
> I was getting _bulk_docs response times after 50 seconds.
>
> This convinces me there's nothing wrong with the codebase, the
> timeouts just needs to be increased:
>
> diff --git a/test/etap/242-replication-many-leaves.t
> b/test/etap/242-replication-many-leaves.t
> index d8d3eb9..6508112 100755
> --- a/test/etap/242-replication-many-leaves.t
> +++ b/test/etap/242-replication-many-leaves.t
> @@ -77,6 +77,7 @@ test() ->
> couch_server_sup:start_link(test_util:config_files()),
> ibrowse:start(),
> crypto:start(),
> + couch_config:set("replicator", "connection_timeout", "90000", false),
>
> Pairs = [
> {source_db_name(), target_db_name()},
> @@ -287,6 +288,6 @@ replicate(Source, Target) ->
> receive
> {'DOWN', MonRef, process, Pid, Reason} ->
> etap:is(Reason, normal, "Replication finished successfully")
> - after 300000 ->
> + after 900000 ->
> etap:bail("Timeout waiting for replication to finish")
> end.
>
> Alternatively the test can be updated to create less revisions for the
> document doc3. The current revisions # is 286 but for the tests'
> purpose 205+ is enough, which should make it faster - 7000 (max url
> length) / length(DocRevision) = 205
>
> If it's ok for you, updating the timeouts plus reducing the # from 286
> to 210 is fine for me.
>
>
>
> On Mon, Jan 23, 2012 at 12:00 AM, Noah Slater <[email protected]>
> wrote:
> > I'm just the dumb QA guy.
> >
> > If you have some diagnostics you want me to run on my machine, I am happy
> > to.
> >
> > On Sun, Jan 22, 2012 at 11:31 PM, Filipe David Manana
> > <[email protected]>wrote:
> >
> >> On Sun, Jan 22, 2012 at 7:20 PM, Noah Slater <[email protected]>
> wrote:
> >> > Works. How do we proceed?
> >>
> >> For how much time does the test runs? On 2 different physical
> >> machines, it takes about 1 minute and 10 seconds for me.
> >>
> >> Perhaps some manual replication tests could confirm if there's
> >> something wrong with the codebase, your environment or if simply
> >> increasing the timeout is not alarming.
> >>
> >> >
> >> > On Sun, Jan 22, 2012 at 7:05 PM, Noah Slater <[email protected]>
> >> wrote:
> >> >
> >> >> OVAR 9000! (Testing now...)
> >> >>
> >> >>
> >> >> On Sun, Jan 22, 2012 at 6:56 PM, Filipe David Manana <
> >> [email protected]>wrote:
> >> >>
> >> >>> On Sun, Jan 22, 2012 at 6:47 PM, Noah Slater <[email protected]>
> >> >>> wrote:
> >> >>> > No change, still fails.
> >> >>>
> >> >>> Noah, to try to find out if it's due to slowness of the machine or
> >> >>> some other issue, do you think you can try to increase the following
> >> >>> timeout in the test?
> >> >>>
> >> >>> diff --git a/test/etap/242-replication-many-leaves.t
> >> >>> b/test/etap/242-replication-many-leaves.t
> >> >>> index d8d3eb9..737cd31 100755
> >> >>> --- a/test/etap/242-replication-many-leaves.t
> >> >>> +++ b/test/etap/242-replication-many-leaves.t
> >> >>> @@ -287,6 +287,6 @@ replicate(Source, Target) ->
> >> >>> receive
> >> >>> {'DOWN', MonRef, process, Pid, Reason} ->
> >> >>> etap:is(Reason, normal, "Replication finished successfully")
> >> >>> - after 300000 ->
> >> >>> + after 900000 ->
> >> >>> etap:bail("Timeout waiting for replication to finish")
> >> >>> end.
> >> >>>
> >> >>> >
> >> >>> > On Sun, Jan 22, 2012 at 6:08 PM, Noah Slater <
> [email protected]>
> >> >>> wrote:
> >> >>> >
> >> >>> >>
> >> >>> >> On Sun, Jan 22, 2012 at 6:01 PM, Filipe David Manana <
> >> >>> [email protected]>wrote:
> >> >>> >>
> >> >>> >>> Noah, does it fail occasionally or every time for you?
> >> >>> >>>
> >> >>> >>
> >> >>> >> Fails every time.
> >> >>> >>
> >> >>> >>
> >> >>> >>> I'm assuming you're with a slow machine or the machine is a bit
> >> >>> >>> overloaded.
> >> >>> >>>
> >> >>> >>
> >> >>> >> Shouldn't be, I'm not doing anything else right now, and this is
> a
> >> new
> >> >>> MBA.
> >> >>> >>
> >> >>> >>
> >> >>> >>> Can you try with the following patch?
> >> >>> >>>
> >> >>> >>
> >> >>> >> Yes. Will report back.
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Filipe David Manana,
> >> >>>
> >> >>> "Reasonable men adapt themselves to the world.
> >> >>> Unreasonable men adapt the world to themselves.
> >> >>> That's why all progress depends on unreasonable men."
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Filipe David Manana,
> >>
> >> "Reasonable men adapt themselves to the world.
> >> Unreasonable men adapt the world to themselves.
> >> That's why all progress depends on unreasonable men."
> >>
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>