I'm a little concerned we had two +1's that mention failures. The one time when we're supposed to have a clean run through, we have 50% of the participators noticing failure. It doesn't instill much confidence in me.
On Thu, Sep 25, 2014 at 2:18 PM, Josh Elser <[email protected]> wrote: > Please make a ticket for it and supply the MAC directories for the test > and the failsafe output. > > It doesn't fail for me. It's possible that there is some edge case that > you and Bill are hitting that I'm not. > > > Corey Nolet wrote: > >> I'm seeing the behavior under Max OS X and Fedora 19 and they have been >> consistently failing for me. I'm thinking ACCUMULO-3073. Since others are >> able to get it to pass, I did not think it should fail the vote solely on >> that but I do think it needs attention, quickly. >> >> On Thu, Sep 25, 2014 at 10:43 AM, Bill Havanki<[email protected]> >> wrote: >> >> I haven't had an opportunity to try it again since my +1, but prior to >>> that >>> it has been consistently failing. >>> >>> - I tried extending the timeout on the test, but it would still time out. >>> - I see the behavior on Mac OS X and under CentOS. (I wonder if it's a >>> JVM >>> thing?) >>> >>> On Wed, Sep 24, 2014 at 9:06 PM, Corey Nolet<[email protected]> wrote: >>> >>> Vote passes with 4 +1's and no -1's. >>>> >>>> Bill, were you able to get the IT to run yet? I'm still having timeouts >>>> >>> on >>> >>>> my end as well. >>>> >>>> >>>> On Wed, Sep 24, 2014 at 1:41 PM, Josh Elser<[email protected]> >>>> >>> wrote: >>> >>>> The crux of it is that both of the errors in the CRC where single bit >>>>> "variants". >>>>> >>>>> y instead of 9 and p instead of 0 >>>>> >>>>> Both of these cases are a '1' in the most significant bit of the byte >>>>> instead of a '0'. We recognized these because y and p are outside of >>>>> >>>> the >>> >>>> hex range. Fixing both of these fixes the CRC error (manually >>>>> >>>> verified). >>> >>>> That's all we know right now. I'm currently running memtest86. I do not >>>>> have ECC ram, so it *is* theoretically possible that was the cause. >>>>> >>>> After >>> >>>> running memtest for a day or so (or until I need my desktop functional >>>>> again), I'll go back and see if I can reproduce this again. >>>>> >>>>> >>>>> Mike Drob wrote: >>>>> >>>>> Any chance the IRC chats can make it only the ML for posterity? >>>>>> >>>>>> Mike >>>>>> >>>>>> On Wed, Sep 24, 2014 at 12:04 PM, Keith Turner<[email protected]> >>>>>> >>>>> wrote: >>>> >>>>> On Wed, Sep 24, 2014 at 12:44 PM, Russ Weeks< >>>>>> >>>>> [email protected]> >>> >>>> wrote: >>>>>>> >>>>>>> Interesting that "y" (0x79) and "9" (0x39) are one bit "away" from >>>>>>> >>>>>> each >>>> >>>>> other. I blame cosmic rays! >>>>>>>> >>>>>>>> It is interesting, and thats only half of the story. Its been >>>>>>>> >>>>>>> interesting >>>>>>> chatting w/ Josh about this on irc and hearing about his findings. >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 24, 2014 at 9:05 AM, Josh Elser<[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> The offending keys are: >>>>>>>> >>>>>>>>> 389a85668b6ebf8e 2ff6:4a78 [] 1411499115242 >>>>>>>>>>> >>>>>>>>>>> 3a10885b-d481-4d00-be00-0477e231ey65:000000008576b169: >>>>>>>>>>> 0cd98965c9ccc1d0:ba15529e >>>>>>>>>>> >>>>>>>>>>> The careful eye will notice that the UUID in the first >>>>>>>>>>> component >>>>>>>>>>> >>>>>>>>>> of >>>> >>>>> the >>>>>>>>> value has a different suffix than the next corrupt key/value (ends >>>>>>>>> >>>>>>>> with >>>> >>>>> "ey65" instead of "e965"). Fixing this in the Value and re-running >>>>>>>>> >>>>>>>> the >>>> >>>>> CRC >>>>>>>> >>>>>>>> makes it pass. >>>>>>>>> >>>>>>>>> >>>>>>>>> and >>>>>>>>> >>>>>>>>> 7e56b58a0c7df128 5fa0:6249 [] 1411499311578 >>>>>>>>>> >>>>>>>>>>> 3a10885b-d481-4d00-be00-0477e231e965:0000p000872d60eb: >>>>>>>>>>> 499fa72752d82a7c:5c5f19e8 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>> >>> -- >>> // Bill Havanki >>> // Solutions Architect, Cloudera Govt Solutions >>> // 443.686.9283 >>> >>> >>
