Re: [DISCUSS] Direction of HBCK2

Wellington Chevreuil Wed, 29 May 2019 05:48:01 -0700

Thanks Toshihiro! I guess those can still be fixed with some combinations
of commands today, such as merge/assign. Of course, it requires some extra
scripting and log reading on cases where many regions are in an
inconsistent state, maybe we should work on provide a one liner command
that relies on the current existing ones. And focus on fixing the main root
cause of such problems, as a mean to soften the need of use such commands.


I'm not really a fan of offlinemetarepair, nor hbck1 fix holes/overlaps,
would rather not have those back. Sure those are easy and convenient to
trigger, but hbck1 reports are sometimes misleading (for instance, it
reports holes when region(s) on the chain is/are simply not online), and
that, combined with availability of such heavy hammers had led
unexperienced operators to fall into running it and getting into a worse
state.

Em qua, 29 de mai de 2019 às 13:22, Toshihiro Suzuki <[email protected]>
escreveu:

> Hi Wellington,
>
> I saw table holes in a customer's cluster actually, and I just fixed the
> issues
> by the workaround I mentioned in HBASE-21665
> <https://issues.apache.org/jira/browse/HBASE-21665> and I didn't dig the
> reason
> why the table holes happened at that time because the customer didn't want.
>
> However, IMO, whatever the reason I think we should have a direct way to
> fix
> holes and overlaps.
>
> On Wed, May 29, 2019 at 4:57 AM Wellington Chevreuil <
> [email protected]> wrote:
>
> > So JMS, Toshihiro, seems like upgrading from some 1.x to 2.x consistently
> > triggers this problem? Do you guys know if there are any bug jiras open
> > that would cover these scenarios? If not, and if you guys have enough
> > resources for investigating it, maybe worth open a specific jira?
> >
> > Em qua, 29 de mai de 2019 às 11:40, Jean-Marc Spaggiari <
> > [email protected]> escreveu:
> >
> > > Personnaly, when I tried to upgrade from 1.4.x to 2.2.x I end up in a
> > > situation where my meta was empty and had to get it repaired, but
> lacked
> > > OfflineMetaRepair for 2.2.x so I just had to delete all my tables, get
> a
> > > brand new installation, recreate the tables and bulkload back the data
> > into
> > > them. Would have been happy to have a OfflineMetaRepair.
> > >
> > > But it's more like an experimental cluster than a production one...
> > >
> > > JMS
> > >
> > > Le mer. 29 mai 2019 à 06:36, Wellington Chevreuil <
> > > [email protected]> a écrit :
> > >
> > > > Interesting, I haven't seen any cases where OfflineMetaRepair was
> > really
> > > > required, among our customer base (running cdh6.1.x/hbase2.1.1,
> > > > cdh6.2/hbase2.1.2). Majority of RITs issue I had came with on hbase
> 2.x
> > > > were related to APs/SCPs failures, most of which could be sorted with
> > > hbck2
> > > > commands available by then (in some cases, required some CLI
> scripting
> > to
> > > > build up a "bulk" assign command).
> > > >
> > > > Em qua, 29 de mai de 2019 às 00:55, Toshihiro Suzuki <
> > > [email protected]>
> > > > escreveu:
> > > >
> > > > > Hi Josh,
> > > > >
> > > > > Thank you for the explanation. I agree with the direction for
> HBCK2.
> > > > >
> > > > > The problem I wanted to tell you in the Jira is that until we
> > implement
> > > > the
> > > > > features
> > > > > you mentioned, we don't have any direct way how to fix holes and
> > > > overlaps.
> > > > > The holes and overlaps can be created by bugs or operation errors,
> > so I
> > > > > think we
> > > > > should be able to fix these issues.
> > > > >
> > > > > I thought OfflineMetaRepair could be a workaround for the issues
> > until
> > > we
> > > > > implement
> > > > > the features of HBCK2.
> > > > >
> > > > > Regards,
> > > > > Toshi
> > > > >
> > > > >
> > > > > On Tue, May 28, 2019 at 9:12 AM Josh Elser <[email protected]>
> > wrote:
> > > > >
> > > > > > Context: https://issues.apache.org/jira/browse/HBASE-21665
> > > > > >
> > > > > > I left a comment on the above issue about what I thought good
> > things
> > > to
> > > > > > build into HBCK2 would be -- a focus on specific "primitive"
> > > operations
> > > > > > that an admin/operator could use to help repair an otherwise
> broken
> > > > > > HBase installation. Some examples I had in my head were:
> > > > > >
> > > > > > * Create an empty region (to plug a hole)
> > > > > > * Report holes in a region chain
> > > > > >
> > > > > > In my head, the difference for HBCK2 was that we want to give
> folks
> > > the
> > > > > > tools to fix their cluster, but we did not want to own the "just
> > fix
> > > > > > everything" kind of tool that HBCK1 had become. That problem with
> > > HBCK1
> > > > > > was that it was often difficult/problematic for us to know how to
> > > > > > correctly fix a problem (the same problem could be corrected in
> > > > > > different ways).
> > > > > >
> > > > > > Andrew had some confusion about this, so I'm not sure if I'm
> > off-base
> > > > or
> > > > > > if we're all in agreement on direction and we just need to do a
> > > better
> > > > > > job documenting things. Thanks for keeping me honest either way
> :)
> > > > > >
> > > > > > And just in case it doesn't go without saying, HBCK2 would be
> > > something
> > > > > > that helps fix a system, while we want to always understand the
> > root
> > > > > > cause of how/why we got into a situation where we needed HBCK2
> and
> > > also
> > > > > > address that.
> > > > > >
> > > > > > - Josh
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Direction of HBCK2

Reply via email to