FYI, once INFRA-12849 is done, I plan on having ITBLL (w/ 1 billion linked list nodes) running on a nightly basis in GCE via clusterdock. I think we'd trust our testing more if we didn't always have to go through the exercise of validating whether a failure we see is new or not and having a history of such for different branches should help with that.
On Friday, November 4, 2016, Andrew Purtell <[email protected]> wrote: > That wasn't my question. At all. > > > On Nov 4, 2016, at 7:27 PM, Ted Yu <[email protected] <javascript:;>> > wrote: > > > > I looked at AssignmentManager#onRegionMerge() between branch-1.1 > > and branch-1.2 > > > > AFAICT, there is no obvious divergence. > > > > Later on, I plan to compare the diff between output for 'git log > > hbase-server/src/main/java/org/apache/hadoop/hbase/ > master/AssignmentManager.java' > > and see which JIRAs were unique to branch-1.2 > > > > Cheers > > > >> On Fri, Nov 4, 2016 at 6:37 PM, Andrew Purtell <[email protected] > <javascript:;>> wrote: > >> > >> I'm not deeply familiar with the AssignmentManager. I see when we > process > >> split rollbacks in onRegionSplit() we only call regionOffline() on > >> daughters if they are known to exist. However when processing merge > >> rollbacks in the else case of onRegionMerge() we unconditionally call > >> regionOffline() on the parent-being-merged. Shouldn't that likewise be > >> conditional on regionStates holding a state for the parent-being-merged? > >> Pardon if I've missed something. > >> > >> > >> On Fri, Nov 4, 2016 at 5:05 PM, Andrew Purtell <[email protected] > <javascript:;>> > >> wrote: > >> > >>> Thanks. Yes I have been eyeing HBASE-16093. There might be another > corner > >>> case there. > >>> > >>> > >>> On Fri, Nov 4, 2016 at 4:41 PM, Gary Helmling <[email protected] > <javascript:;>> > >> wrote: > >>> > >>>>> > >>>>> The behavior: Looks like failed split/compaction rollback: row(s) in > >>>> META > >>>>> without HRegionInfo, regions deployed without valid meta entries (at > >>>>> first), regions on HDFS without valid meta entries (later, after RS > >>>>> carrying them are killed by chaos), holes in the region chain leading > >> to > >>>>> timeouts and job failure. > >>>>> > >>>>> > >>>> The empty regioninfo in meta sounds like HBASE-16093, though that fix > is > >>>> in > >>>> 1.2. Interested to see if there are other problems around splits > >> though. > >>>> Do you have a JIRA yet for tracking? > >>>> > >>>> > >>>>> > >>>>> You'll know you have found it when on the ITBLL console its meta > >> scanner > >>>>> starts complaining about rows in meta without serialized HRegionInfo. > >>>>> > >>>>> > >>>> Will keep an eye out for this in our ITBLL runs here. > >>>> > >>> > >>> > >>> > >>> -- > >>> Best regards, > >>> > >>> - Andy > >>> > >>> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > >>> (via Tom White) > >>> > >> > >> > >> > >> -- > >> Best regards, > >> > >> - Andy > >> > >> Problems worthy of attack prove their worth by hitting back. - Piet Hein > >> (via Tom White) > >> > -- -Dima
