Thanks Wellington. > I guess those can still be fixed with some combinations of commands today, > such as merge/assign.
Let me explain the situation I faced in the customer's cluster a little bit more. It seemed like the table data in HDFS was intact but they lost some meta data (in hbase:meta) of the table. So I needed to rebuild the meta from HDFS data. In this case, we can still fix with some combinations of commands today? If so, I would appreciate it if you could suggest the steps to me. > And focus on fixing the main root cause of such problems, as a mean to > soften the need of use such commands. Yes, correct. Actually I usually do that. But I didn't do that in that case.. On Wed, May 29, 2019 at 5:47 AM Wellington Chevreuil < wellington.chevre...@gmail.com> wrote: > Thanks Toshihiro! I guess those can still be fixed with some combinations > of commands today, such as merge/assign. Of course, it requires some extra > scripting and log reading on cases where many regions are in an > inconsistent state, maybe we should work on provide a one liner command > that relies on the current existing ones. And focus on fixing the main root > cause of such problems, as a mean to soften the need of use such commands. > > I'm not really a fan of offlinemetarepair, nor hbck1 fix holes/overlaps, > would rather not have those back. Sure those are easy and convenient to > trigger, but hbck1 reports are sometimes misleading (for instance, it > reports holes when region(s) on the chain is/are simply not online), and > that, combined with availability of such heavy hammers had led > unexperienced operators to fall into running it and getting into a worse > state. > > Em qua, 29 de mai de 2019 às 13:22, Toshihiro Suzuki <brfrn...@apache.org> > escreveu: > > > Hi Wellington, > > > > I saw table holes in a customer's cluster actually, and I just fixed the > > issues > > by the workaround I mentioned in HBASE-21665 > > <https://issues.apache.org/jira/browse/HBASE-21665> and I didn't dig the > > reason > > why the table holes happened at that time because the customer didn't > want. > > > > However, IMO, whatever the reason I think we should have a direct way to > > fix > > holes and overlaps. > > > > On Wed, May 29, 2019 at 4:57 AM Wellington Chevreuil < > > wellington.chevre...@gmail.com> wrote: > > > > > So JMS, Toshihiro, seems like upgrading from some 1.x to 2.x > consistently > > > triggers this problem? Do you guys know if there are any bug jiras open > > > that would cover these scenarios? If not, and if you guys have enough > > > resources for investigating it, maybe worth open a specific jira? > > > > > > Em qua, 29 de mai de 2019 às 11:40, Jean-Marc Spaggiari < > > > jean-m...@spaggiari.org> escreveu: > > > > > > > Personnaly, when I tried to upgrade from 1.4.x to 2.2.x I end up in a > > > > situation where my meta was empty and had to get it repaired, but > > lacked > > > > OfflineMetaRepair for 2.2.x so I just had to delete all my tables, > get > > a > > > > brand new installation, recreate the tables and bulkload back the > data > > > into > > > > them. Would have been happy to have a OfflineMetaRepair. > > > > > > > > But it's more like an experimental cluster than a production one... > > > > > > > > JMS > > > > > > > > Le mer. 29 mai 2019 à 06:36, Wellington Chevreuil < > > > > wellington.chevre...@gmail.com> a écrit : > > > > > > > > > Interesting, I haven't seen any cases where OfflineMetaRepair was > > > really > > > > > required, among our customer base (running cdh6.1.x/hbase2.1.1, > > > > > cdh6.2/hbase2.1.2). Majority of RITs issue I had came with on hbase > > 2.x > > > > > were related to APs/SCPs failures, most of which could be sorted > with > > > > hbck2 > > > > > commands available by then (in some cases, required some CLI > > scripting > > > to > > > > > build up a "bulk" assign command). > > > > > > > > > > Em qua, 29 de mai de 2019 às 00:55, Toshihiro Suzuki < > > > > brfrn...@apache.org> > > > > > escreveu: > > > > > > > > > > > Hi Josh, > > > > > > > > > > > > Thank you for the explanation. I agree with the direction for > > HBCK2. > > > > > > > > > > > > The problem I wanted to tell you in the Jira is that until we > > > implement > > > > > the > > > > > > features > > > > > > you mentioned, we don't have any direct way how to fix holes and > > > > > overlaps. > > > > > > The holes and overlaps can be created by bugs or operation > errors, > > > so I > > > > > > think we > > > > > > should be able to fix these issues. > > > > > > > > > > > > I thought OfflineMetaRepair could be a workaround for the issues > > > until > > > > we > > > > > > implement > > > > > > the features of HBCK2. > > > > > > > > > > > > Regards, > > > > > > Toshi > > > > > > > > > > > > > > > > > > On Tue, May 28, 2019 at 9:12 AM Josh Elser <els...@apache.org> > > > wrote: > > > > > > > > > > > > > Context: https://issues.apache.org/jira/browse/HBASE-21665 > > > > > > > > > > > > > > I left a comment on the above issue about what I thought good > > > things > > > > to > > > > > > > build into HBCK2 would be -- a focus on specific "primitive" > > > > operations > > > > > > > that an admin/operator could use to help repair an otherwise > > broken > > > > > > > HBase installation. Some examples I had in my head were: > > > > > > > > > > > > > > * Create an empty region (to plug a hole) > > > > > > > * Report holes in a region chain > > > > > > > > > > > > > > In my head, the difference for HBCK2 was that we want to give > > folks > > > > the > > > > > > > tools to fix their cluster, but we did not want to own the > "just > > > fix > > > > > > > everything" kind of tool that HBCK1 had become. That problem > with > > > > HBCK1 > > > > > > > was that it was often difficult/problematic for us to know how > to > > > > > > > correctly fix a problem (the same problem could be corrected in > > > > > > > different ways). > > > > > > > > > > > > > > Andrew had some confusion about this, so I'm not sure if I'm > > > off-base > > > > > or > > > > > > > if we're all in agreement on direction and we just need to do a > > > > better > > > > > > > job documenting things. Thanks for keeping me honest either way > > :) > > > > > > > > > > > > > > And just in case it doesn't go without saying, HBCK2 would be > > > > something > > > > > > > that helps fix a system, while we want to always understand the > > > root > > > > > > > cause of how/why we got into a situation where we needed HBCK2 > > and > > > > also > > > > > > > address that. > > > > > > > > > > > > > > - Josh > > > > > > > > > > > > > > > > > > > > > > > > > > > >