There are some recovery cases where the cluster cannot be expected to be up
and running. What happens if we have no tooling for those? The user has a
dead cluster. So I don't think a requirement that the cluster be up and
running always is sufficient. For this type of recovery operator-tools must
be able to parse and write on disk formats. On the other hand hopefully the
cases for which that is not true are rare. In HBase 1, we had
OffineMetaRebuild. For my operations occasionally it has been necessary, in
test environments especially where users are not always clueful, and it has
shortened incident time from many hours to less than one hour. The
alternative would have been rebuild from scratch with total data loss,
which is a totally unsatisfying user experience.


On Sun, Feb 20, 2022 at 4:29 AM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> Sorry a bit late...
>
> IIRC, the design of HBCK2 is that, most of the actual fix logic should be
> done inside hbase(usually as a procedure), and the hbase-operator-tools is
> just a facade for calling these methods. It will query the cluster to find
> out which features are supportted. So in general, the design here is to
> always have the cluster up when fixing. We have a maintenance mode where we
> will just bring up HMaster and make meta table online, without loading any
> other regions.
>
> So I prefer we just use snapshot dependencies of hbase in HBCK2. It is not
> a big deal for end users as if we have not make the release yet, the new
> fixing options can never be actually used against a production cluster.
>
> Anyway, this means we need to publish nightly builds then.
>
> Thanks.
>
> Peter Somogyi <psomo...@apache.org> 于2022年2月18日周五 06:40写道:
>
> > Makes sense. Thanks Andrew for clarifying!
> >
> > On Thu, Feb 17, 2022, 21:28 Andrew Purtell <apurt...@apache.org> wrote:
> >
> > > On Thu, Feb 17, 2022 at 12:19 PM Peter Somogyi <psomo...@apache.org>
> > > wrote:
> > >
> > > > I like the idea of including the store file tracking in 2.5.0 to
> > unblock
> > > > the HBCK development efforts.
> > > >
> > > > Unfortunately, I was not following its development that much. Can it
> > > cause
> > > > any issues if 2.5.0 has the feature but later an incompatible change
> is
> > > > needed for SFT? Can it be marked as a beta feature where we are free
> to
> > > > modify interfaces?
> > > >
> > >
> > > Yes, this is what I meant when I suggested we could mark it as
> > > 'experimental'. We have done this in the past. The word 'experimental'
> is
> > > prominently included adjacent to any discussion of the feature in
> > > documentation and release notes. When we feel for sure it is stable
> that
> > > word is removed. We can do something different this time of course but
> > that
> > > has been our past practice when introducing new functionality into
> > > releasing code lines. And I presume we would use the Evolving interface
> > > annotation everywhere.
> > >
> > > Peter
> > > >
> > > > On Tue, Feb 15, 2022 at 11:07 PM Andrew Purtell <
> > > andrew.purt...@gmail.com>
> > > > wrote:
> > > >
> > > > > Another option which I do not see mentioned yet is to extract the
> > > > relevant
> > > > > common proto and source files from the ‘hbase’ repository into a
> new
> > > > > repository (‘hbase-storage’?), from which we would release
> artifacts
> > to
> > > > be
> > > > > consumed by both hbase and hbase-operator-tools. This maintains
> > D.R.Y.
> > > > > through refactoring although it may down the road cause some
> > complexity
> > > > in
> > > > > coordinating evolution among the three (if not more) repositories
> and
> > > > > releases produced from them. This is like Josh’s Option 1 but
> without
> > > > > duplication.
> > > > >
> > > > > Regarding the option 2 issue… If it would help we can drop SFT into
> > > > > branch-2.5 along with the log4j2 changes and release 2.5.0
> afterward.
> > > We
> > > > > are taking the opportunity of this minor increment to accelerate
> > log4j1
> > > > > retirement, which is why it’s still waiting (but not for long). We
> > can
> > > > use
> > > > > the same opportunity to release SFT even if we designate it as an
> > > > > experimental feature if that would simplify some other logistics.
> For
> > > > what
> > > > > it’s worth.
> > > > >
> > > > > > On Feb 15, 2022, at 7:44 AM, Josh Elser <els...@apache.org>
> wrote:
> > > > > >
> > > > > > I was talking with Szabolcs prior to him sending this one, and
> > it's
> > > a
> > > > > tricky issue for sure.
> > > > > >
> > > > > > To date, we've solved any HBase API issues by copying code into
> > HBCK2
> > > > > e.g. HBCKMetaTableAccessor which copies parts of MetaTableAccessor,
> > or
> > > we
> > > > > push the logic down server-side to the HBase Master and invoke it
> > over
> > > > the
> > > > > Hbck RPC interface.
> > > > > >
> > > > > > I definitely want to avoid HBase version specific builds of the
> > > > > operator-tools, so that is not an option in my mind for 2.x. The
> > > > > discussions we had (that I remember) around HBCK2 were limited in
> > scope
> > > > to
> > > > > HBase 2.x.
> > > > > >
> > > > > > Option 1: we copy the necessary proto files from HBase into the
> > > > > operator-tools and try to remember that, if we make any change to
> the
> > > > > serialization of the storefile list files, we have to copy that
> > change
> > > to
> > > > > HBCK2. Brittle on the surface but effective.
> > > > > >
> > > > > > Option 2: We bump HBCK2 to hbase-2.6.0-SNAPSHOT. Problematic
> until
> > we
> > > > > make an HBase 2.6.0[-alpha] release. We should already have wire
> > compat
> > > > > between all of HBase 2.x which makes that a non-issue.
> > > > > >
> > > > > > Option 3: We create an HBCK3 targeted for HBase 3.x. I'm not
> > > convinced
> > > > > we need to do that (hbck for hbase 3.x would be just like hbck for
> > > hbase
> > > > > 2.x). This would also not solve the problem for the SFT feature in
> > > hbase
> > > > > 2.6.
> > > > > >
> > > > > > I think option 3 is a no-go. I am leaning towards option 1 at
> this
> > > > > point. Hopefully my thought process is helpful for others to weigh
> > in.
> > > > > >
> > > > > >
> > > > > >> On 2/14/22 11:31 AM, Szabolcs Bukros wrote:
> > > > > >> Hi Folks!
> > > > > >> While working on adding tools to handle potential FileBased
> > > > > >> StoreFileTracker issues to HBCK2 (HBASE-26624
> > > > > >> <https://issues.apache.org/jira/browse/HBASE-26624>) I ran into
> > > > > multiple
> > > > > >> problems I'm unsure how to solve.
> > > > > >> First of all the tools would rely on files not yet available in
> > any
> > > of
> > > > > the
> > > > > >> released hbase artifacts. I tried to solve this without changing
> > the
> > > > > hbase
> > > > > >> dependency version to keep HBCK2 as hbase version independent as
> > > > > possible,
> > > > > >> but none of the solutions I have found looked acceptable:
> > > > > >>  - Pushing the logic to the hbase side (as far as I can tell) is
> > not
> > > > > >> feasible because it has to be able to repair meta which is
> easier
> > > when
> > > > > >> hbase is down and the tool should be able to run without a
> working
> > > > > hbase.
> > > > > >>  - The files tracking the store content are serialized proto
> > objects
> > > > so
> > > > > >> while replicating those files in the operator tools is possible,
> > it
> > > > > would
> > > > > >> not be pretty.
> > > > > >> Bumping operator tools to use hbase 2.6.0-SNAPSHOT (branch-2 has
> > the
> > > > SFT
> > > > > >> changes) would mean that now we need that or a newer version to
> > > build
> > > > > the
> > > > > >> project and a version check to avoid runtime problems with the
> new
> > > > > tools,
> > > > > >> but otherwise this looks rather painless and backwards
> > compatible. I
> > > > > know
> > > > > >> operator tools tries to avoid having a hbase-specific release,
> but
> > > > > having
> > > > > >> 2.6 as a min version to build against might be acceptable.
> > > > > >> While looking into this I also checked what needs to be done to
> > make
> > > > > >> operator tools work with hbase 3.0.0-alpha-3-SNAPSHOT. Most of
> the
> > > > > changes
> > > > > >> are backwards compatible but not all of them and the ones that
> > > aren't
> > > > > would
> > > > > >> make a big chunk of Fsck unusable with older hbases. For me that
> > > looks
> > > > > >> acceptable since this is a major version change, but that would
> > > mean I
> > > > > can
> > > > > >> not rely on a potential HBCK3 to fix SFT issues, I would also
> > need a
> > > > > >> solution for HBCK2.
> > > > > >> I tried to look for plans/direction regarding the new 1.3
> operator
> > > > tools
> > > > > >> but could not find any.
> > > > > >> Do you think it would be possible to bump the hbase version it
> > uses
> > > to
> > > > > >> 2.6.0-SNAPSHOT?
> > > > > >> Do you think it would make sense to start working on a hbase3
> > > > compatible
> > > > > >> branch or is it too early?
> > > > > >> NOTE:
> > > > > >> I'm aware hbase does not publish SNAPSHOT builds for years, but
> I
> > do
> > > > not
> > > > > >> know how the internal build system works and if these artifacts
> > > would
> > > > be
> > > > > >> available for internal builds or not. I also do not know if
> > > necessary
> > > > > could
> > > > > >> they be made available.
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Unrest, ignorance distilled, nihilistic imbeciles -
> > >     It's what we’ve earned
> > > Welcome, apocalypse, what’s taken you so long?
> > > Bring us the fitting end that we’ve been counting on
> > >    - A23, Welcome, Apocalypse
> > >
> >
>


-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
    It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse

Reply via email to