Re: [DISCUSS] Direction of HBCK2

Toshihiro Suzuki Wed, 29 May 2019 07:12:05 -0700

Thanks Wellington.

> I guess those can still be fixed with some combinations of commands
today,
> such as merge/assign.


Let me explain the situation I faced in the customer's cluster a little bit
more.
It seemed like the table data in HDFS was intact but they lost some meta
data
(in hbase:meta) of the table. So I needed to rebuild the meta from HDFS
data.
In this case, we can still fix with some combinations of commands today? If
so,
I would appreciate it if you could suggest the steps to me.

> And focus on fixing the main root cause of such problems, as a mean to
> soften the need of use such commands.

Yes, correct. Actually I usually do that. But I didn't do that in that
case..


On Wed, May 29, 2019 at 5:47 AM Wellington Chevreuil <
wellington.chevre...@gmail.com> wrote:

> Thanks Toshihiro! I guess those can still be fixed with some combinations
> of commands today, such as merge/assign. Of course, it requires some extra
> scripting and log reading on cases where many regions are in an
> inconsistent state, maybe we should work on provide a one liner command
> that relies on the current existing ones. And focus on fixing the main root
> cause of such problems, as a mean to soften the need of use such commands.
>
> I'm not really a fan of offlinemetarepair, nor hbck1 fix holes/overlaps,
> would rather not have those back. Sure those are easy and convenient to
> trigger, but hbck1 reports are sometimes misleading (for instance, it
> reports holes when region(s) on the chain is/are simply not online), and
> that, combined with availability of such heavy hammers had led
> unexperienced operators to fall into running it and getting into a worse
> state.
>
> Em qua, 29 de mai de 2019 às 13:22, Toshihiro Suzuki <brfrn...@apache.org>
> escreveu:
>
> > Hi Wellington,
> >
> > I saw table holes in a customer's cluster actually, and I just fixed the
> > issues
> > by the workaround I mentioned in HBASE-21665
> > <https://issues.apache.org/jira/browse/HBASE-21665> and I didn't dig the
> > reason
> > why the table holes happened at that time because the customer didn't
> want.
> >
> > However, IMO, whatever the reason I think we should have a direct way to
> > fix
> > holes and overlaps.
> >
> > On Wed, May 29, 2019 at 4:57 AM Wellington Chevreuil <
> > wellington.chevre...@gmail.com> wrote:
> >
> > > So JMS, Toshihiro, seems like upgrading from some 1.x to 2.x
> consistently
> > > triggers this problem? Do you guys know if there are any bug jiras open
> > > that would cover these scenarios? If not, and if you guys have enough
> > > resources for investigating it, maybe worth open a specific jira?
> > >
> > > Em qua, 29 de mai de 2019 às 11:40, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org> escreveu:
> > >
> > > > Personnaly, when I tried to upgrade from 1.4.x to 2.2.x I end up in a
> > > > situation where my meta was empty and had to get it repaired, but
> > lacked
> > > > OfflineMetaRepair for 2.2.x so I just had to delete all my tables,
> get
> > a
> > > > brand new installation, recreate the tables and bulkload back the
> data
> > > into
> > > > them. Would have been happy to have a OfflineMetaRepair.
> > > >
> > > > But it's more like an experimental cluster than a production one...
> > > >
> > > > JMS
> > > >
> > > > Le mer. 29 mai 2019 à 06:36, Wellington Chevreuil <
> > > > wellington.chevre...@gmail.com> a écrit :
> > > >
> > > > > Interesting, I haven't seen any cases where OfflineMetaRepair was
> > > really
> > > > > required, among our customer base (running cdh6.1.x/hbase2.1.1,
> > > > > cdh6.2/hbase2.1.2). Majority of RITs issue I had came with on hbase
> > 2.x
> > > > > were related to APs/SCPs failures, most of which could be sorted
> with
> > > > hbck2
> > > > > commands available by then (in some cases, required some CLI
> > scripting
> > > to
> > > > > build up a "bulk" assign command).
> > > > >
> > > > > Em qua, 29 de mai de 2019 às 00:55, Toshihiro Suzuki <
> > > > brfrn...@apache.org>
> > > > > escreveu:
> > > > >
> > > > > > Hi Josh,
> > > > > >
> > > > > > Thank you for the explanation. I agree with the direction for
> > HBCK2.
> > > > > >
> > > > > > The problem I wanted to tell you in the Jira is that until we
> > > implement
> > > > > the
> > > > > > features
> > > > > > you mentioned, we don't have any direct way how to fix holes and
> > > > > overlaps.
> > > > > > The holes and overlaps can be created by bugs or operation
> errors,
> > > so I
> > > > > > think we
> > > > > > should be able to fix these issues.
> > > > > >
> > > > > > I thought OfflineMetaRepair could be a workaround for the issues
> > > until
> > > > we
> > > > > > implement
> > > > > > the features of HBCK2.
> > > > > >
> > > > > > Regards,
> > > > > > Toshi
> > > > > >
> > > > > >
> > > > > > On Tue, May 28, 2019 at 9:12 AM Josh Elser <els...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Context: https://issues.apache.org/jira/browse/HBASE-21665
> > > > > > >
> > > > > > > I left a comment on the above issue about what I thought good
> > > things
> > > > to
> > > > > > > build into HBCK2 would be -- a focus on specific "primitive"
> > > > operations
> > > > > > > that an admin/operator could use to help repair an otherwise
> > broken
> > > > > > > HBase installation. Some examples I had in my head were:
> > > > > > >
> > > > > > > * Create an empty region (to plug a hole)
> > > > > > > * Report holes in a region chain
> > > > > > >
> > > > > > > In my head, the difference for HBCK2 was that we want to give
> > folks
> > > > the
> > > > > > > tools to fix their cluster, but we did not want to own the
> "just
> > > fix
> > > > > > > everything" kind of tool that HBCK1 had become. That problem
> with
> > > > HBCK1
> > > > > > > was that it was often difficult/problematic for us to know how
> to
> > > > > > > correctly fix a problem (the same problem could be corrected in
> > > > > > > different ways).
> > > > > > >
> > > > > > > Andrew had some confusion about this, so I'm not sure if I'm
> > > off-base
> > > > > or
> > > > > > > if we're all in agreement on direction and we just need to do a
> > > > better
> > > > > > > job documenting things. Thanks for keeping me honest either way
> > :)
> > > > > > >
> > > > > > > And just in case it doesn't go without saying, HBCK2 would be
> > > > something
> > > > > > > that helps fix a system, while we want to always understand the
> > > root
> > > > > > > cause of how/why we got into a situation where we needed HBCK2
> > and
> > > > also
> > > > > > > address that.
> > > > > > >
> > > > > > > - Josh
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Direction of HBCK2

Reply via email to