Re: [VOTE] Apache CouchDB 1.2.0 release, second round

Jan Lehnardt Sun, 26 Feb 2012 13:08:18 -0800

Bob,

thanks for your reply


I wasn't implying we should try to explain anything away. All of these are 
valid concerns, I just wanted to get a better understanding on where the bit 
flips from +0 to -1 and subsequently, how to address that boundary. Ideally we 
can just fix all of the things you mention, but I think it is important to 
understand them in detail, that's why I was going into them. Ultimately, I want 
to understand what we need to do to ship 1.2.0.

On Feb 26, 2012, at 21:22 , Bob Dionne wrote:

> Jan,
> 
> I'm -1 based on all of my evaluation. I've spent a few hours on this release 
> now yesterday and today. It doesn't really pass what I would call the "smoke 
> test". Almost everything I've run into has an explanation:
> 
> 1. crashes out of the box - that's R15B, you need to recompile SSL and Erlang 
> (we'll note on release notes)

Have we spent any time on figuring out what the trouble here is?


> 2. etaps hang running make check. Known issue. Our etap code is out of date, 
> recent versions of etap don't even run their own unit tests

I have seen the etap hang as well, and I wasn't diligent enough to report it in 
JIRA, I have done so now (COUCHDB-1424).


> 3. Futon tests fail. Some are known bugs (attachment ranges in Chrome) . Both 
> Chrome and Safari also hang

Do you have more details on where Chrome and Safari hang? Can you try their 
private browsing features, double/triple check that caches are empty? Can you 
get to a situation where you get all tests succeeding across all browsers, even 
if individual ones fail on one or two others?


> 4. standalone JS tests fail. Again most of these run when run by themselves

Which ones?


> 5. performance. I used real production data *because* Stefan on user reported 
> performance degradation on his data set. Any numbers are meaningless for a 
> single test. I also ran scripts that BobN and Jason Smith posted that show a 
> difference between 1.1.x and 1.2.x

You are conflating an IRC discussion we've had into this thread. The 
performance regression reported is a good reason to look into other scenarios 
where we can show slowdowns. But we need to understand what's happening. Just 
from looking at dev@ all I see is some handwaving about some reports some 
people have done (Not to discourage any work that has been done on IRC and 
user@, but for the sake of a release vote thread, this related information 
needs to be on this mailing list).

As I said on IRC, I'm happy to get my hands dirty to understand the regression 
at hand. But we need to know where we'd draw a line and say this isn't 
acceptable for a 1.2.0.


> 6. Reviewed patch pointed to by Jason that may be the cause but it's hard to 
> say without knowing the code analysis that went into the changes. You can see 
> obvious local optimizations that make good sense but those are often the ones 
> that get you, without knowing the call counts.

That is a point that wasn't included in your previous mail. It's great that 
there is progress, thanks for looking into this!


> Many of these issues can be explained away, but I think end users will be 
> less forgiving. I think we already struggle with view performance. I'm 
> interested to see how others evaluate this regression.
> I'll try this seatoncouch tool you mention later to see if I can construct 
> some more definitive tests.

Again, I'm not trying to explain anything away. I want to get a shared 
understanding of the issues you raised and where we stand on solving them 
squared against the ongoing 1.2.0 release.

And again: Thanks for doing this thorough review and looking into performance 
issue. I hope with your help we can understand all these things a lot better 
very soon :)

Cheers
Jan
-- 


> 
> Best,
> 
> Bob
> On Feb 26, 2012, at 2:29 PM, Jan Lehnardt wrote:
> 
>> 
>> On Feb 26, 2012, at 13:58 , Bob Dionne wrote:
>> 
>>> -1
>>> 
>>> R15B on OS X Lion
>>> 
>>> I rebuilt OTP with an older SSL and that gets past all the crashes (thanks 
>>> Filipe). I still see hangs when running make check, though any particular 
>>> etap that hangs will run ok by itself. The Futon tests never run to 
>>> completion in Chrome without hanging and the standalone JS tests also have 
>>> fails.
>> 
>> What part of this do you consider the -1? Can you try running the JS tests 
>> in Firefox and or Safari? Can you get all tests pass at least once across 
>> all browsers? The cli JS suite isn't supposed to work, so that isn't a 
>> criterion. I've seen the hang in make check for R15B while individual tests 
>> run as well, but I don't consider this blocking. While I understand and 
>> support the notion that tests shouldn't fail, period, we gotta work with 
>> what we have and master already has significant improvements. What would you 
>> like to see changed to not -1 this release?
>> 
>>> I tested the performance of view indexing, using a modest 200K doc db with 
>>> a large complex view and there's a clear regression between 1.1.x and 1.2.x 
>>> Others report similar results
>> 
>> What is a large complex view? The complexity of the map/reduce functions is 
>> rarely an indicator of performance, it's usually input doc size and 
>> output/emit()/reduce data size. How big are the docs in your test and how 
>> big is the returned data? I understand the changes for 1.2.x will improve 
>> larger-data scenarios more significantly.
>> 
>> Cheers
>> Jan
>> -- 
>> 
>> 
>> 
>> 
>>> 
>>> On Feb 23, 2012, at 5:25 PM, Bob Dionne wrote:
>>> 
>>>> sorry Noah, I'm in debug mode today so I don't care to start mucking with 
>>>> my stack, recompiling erlang, etc...
>>>> 
>>>> I did try using that build repeatedly and it crashes all the time. I find 
>>>> it very odd and I had seen those before as I said on my older macbook. 
>>>> 
>>>> I do see the hangs Jan describes in the etaps, they have been there right 
>>>> along, so I'm confident this just the SSL issue. Why it only happens on 
>>>> the build is puzzling, any source build of any branch works just peachy.
>>>> 
>>>> So I'd say I'm +1 based on my use of the 1.2.x branch but I'd like to hear 
>>>> from Stefan, who reported the severe performance regression. BobN seems to 
>>>> think we can ignore that, it's something flaky in that fellow's 
>>>> environment. I tend to agree but I'm conservative
>>>> 
>>>> On Feb 23, 2012, at 1:23 PM, Noah Slater wrote:
>>>> 
>>>>> Can someone convince me this bus error stuff and segfaults is not a
>>>>> blocking issue.
>>>>> 
>>>>> Bob tells me that he's followed the steps above and he's still 
>>>>> experiencing
>>>>> the issues.
>>>>> 
>>>>> Bob, you did follow the steps to install your own SSL right?
>>>>> 
>>>>> On Thu, Feb 23, 2012 at 5:09 PM, Jan Lehnardt <j...@apache.org> wrote:
>>>>> 
>>>>>> 
>>>>>> On Feb 23, 2012, at 00:28 , Noah Slater wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I would like call a vote for the Apache CouchDB 1.2.0 release, second
>>>>>> round.
>>>>>>> 
>>>>>>> We encourage the whole community to download and test these
>>>>>>> release artifacts so that any critical issues can be resolved before the
>>>>>>> release is made. Everyone is free to vote on this release, so get stuck
>>>>>> in!
>>>>>>> 
>>>>>>> We are voting on the following release artifacts:
>>>>>>> 
>>>>>>> http://people.apache.org/~nslater/dist/1.2.0/
>>>>>>> 
>>>>>>> 
>>>>>>> These artifacts have been built from the following tree-ish in Git:
>>>>>>> 
>>>>>>> 4cd60f3d1683a3445c3248f48ae064fb573db2a1
>>>>>>> 
>>>>>>> 
>>>>>>> Please follow the test procedure before voting:
>>>>>>> 
>>>>>>> http://wiki.apache.org/couchdb/Test_procedure
>>>>>>> 
>>>>>>> 
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> Happy voting,
>>>>>> 
>>>>>> Signature and hashes check out.
>>>>>> 
>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.0, Erlang R14B04: make check
>>>>>> works fine, browser tests in Safari work fine.
>>>>>> 
>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: make check
>>>>>> works fine, browser tests in Safari work fine.
>>>>>> 
>>>>>> FreeBSD 9.0, 64bit, SpiderMonkey 1.7.0, Erlang R14B04: make check works
>>>>>> fine, browser tests in Safari work fine.
>>>>>> 
>>>>>> CentOS 6.2, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: make check works
>>>>>> fine, browser tests in Firefox work fine.
>>>>>> 
>>>>>> Ubuntu 11.4, 64bit, SpiderMonkey 1.8.5, Erlang R14B02: make check works
>>>>>> fine, browser tests in Firefox work fine.
>>>>>> 
>>>>>> Ubuntu 10.4, 32bit, SpiderMonkey 1.8.0, Erlang R13B03: make check fails 
>>>>>> in
>>>>>> - 076-file-compression.t: https://gist.github.com/1893373
>>>>>> - 220-compaction-daemon.t: https://gist.github.com/1893387
>>>>>> This on runs in a VM and is 32bit, so I don't know if there's anything in
>>>>>> the tests that rely on 64bittyness or the R14B03. Filipe, I think you
>>>>>> worked on both features, do you have an idea?
>>>>>> 
>>>>>> I tried running it all through Erlang R15B on Mac OS X 1.7.3, but a good
>>>>>> way into `make check` the tests would just stop and hang. The last time,
>>>>>> repeatedly in 160-vhosts.t, but when run alone, that test finished in 
>>>>>> under
>>>>>> five seconds. I'm not sure what the issue is here.
>>>>>> 
>>>>>> Despite the things above, I'm happy to give this a +1 if we put a warning
>>>>>> about R15B on the download page.
>>>>>> 
>>>>>> Great work all!
>>>>>> 
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>

Re: [VOTE] Apache CouchDB 1.2.0 release, second round

Reply via email to