According the comments around the code I was hope so, but suddenly the
issue is looks like OTP-9167 and acts as OTP-9167 - I'm not sure how
to classify it else then OTP-9167. However, I also have a feeling that
some code is missed around this note:
https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator.erl#L299

--
,,,^..^,,,


On Sun, May 4, 2014 at 7:43 PM, Robert Samuel Newson <[email protected]> wrote:
> Hrm,  OTP-9167 was reported by Filipe, the main author of the current couchdb 
> replicator, and he also changed how this was handled in couchdb to 
> compensate. This is some ancient stuff, hard to believe it’s the cause of our 
> latest issue. I must be missing something.
>
> On 4 May 2014, at 12:46, Alexander Shorin <[email protected]> wrote:
>
>> On Wed, Apr 30, 2014 at 4:17 PM, Alexander Shorin <[email protected]> wrote:
>>> On Tue, Apr 29, 2014 at 5:56 PM, Alexander Shorin <[email protected]> wrote:
>>>> On Wed, Apr 23, 2014 at 1:03 PM, Mutton, James <[email protected]> wrote:
>>>>> well, bummer.  Tried 3 times on R14B01, all 3 I get:
>>>>> /tmp/couchdb/dist/apache-couchdb-1.6.0/apache-couchdb-1.6.0/_build/../src/couch_replicator/test/07-use-checkpoints.t
>>>>>  .......... Failed 4/16 subtests
>>>>>
>>>>> Test Summary Report
>>>>> -------------------
>>>>> /tmp/couchdb/dist/apache-couchdb-1.6.0/apache-couchdb-1.6.0/_build/../src/couch_replicator/test/07-use-checkpoints.t
>>>>>         (Wstat: 0 Tests: 16 Failed: 4)
>>>>>  Failed tests:  9, 12-13, 15
>>>>> Files=7, Tests=1832, 150 wallclock secs ( 0.81 usr  0.09 sys + 155.32 
>>>>> cusr 13.16 csys = 169.38 CPU)
>>>>> Result: FAIL
>>>>> make[3]: *** [check] Error 1
>>>>>
>>>>> Unfortunately, I’m needing some sleep then leaving on some vacation for 
>>>>> the rest of the week.  I’ll see if I can maybe look closer at what’s 
>>>>> going on locally while on the flight.
>>>>
>>>> I'm failed to reproduce this with R14B04, but will try to R14B01 as you 
>>>> have.
>>>
>>> Confirmed for R14B01.
>>
>> Ok, I've found the roots of this issue. It's even named as OTP-9167 as
>> was fixed in R14B03 and because of it 07-use-checkpoints.t fails for
>> R14B01: it couldn't run replicator worker with new child spec where
>> use_checkpoint bit flipped because supervisor hold the initial one, it
>> see that there replication with the same id going to happen and
>> restarts it with the old spec ignoring any changes. I could fix the
>> test, but I couldn't fix the issue in root and not sure that it's
>> worths to search for any workarounds nowdays (R14B03 was released at
>> 2011-05-24, almost 3 years ago).
>>
>> However, here are three solutions that I have:
>>
>> 0. Do nothing.
>> 1. Isolate tests from each other to hide the issue (isolation is good,
>> but hiding bugs is bad):
>> https://www.friendpaste.com/1lnTEFg6RId5PDRAmvbBVO
>> 2. On test failure check Erlang version and note that this failure is
>> *fine* for specific versions:
>> https://www.friendpaste.com/3TmqoNjEF3xnYtbLybSL7G
>> 3. Add "+no_checkpoints" suffix to replication id if "use_checkpoints:
>> false" was specified. Thus, it solves the problem:
>> https://www.friendpaste.com/3TmqoNjEF3xnYtbLybSKpT (yes, some
>> refactoring love is required)
>> But I'm not sure that this is good idea.
>>
>> Personally, I would prefer to keep this "bug" alive as reminder that
>> things for your Erlang version *could* happens wrong. Your thoughts?
>>
>>
>> --
>> ,,,^..^,,,
>

Reply via email to