I just talked about our old design choice, it was not made by me...

In fact, for me, I agree that if we want to operate on an active cluster,
we'd better go against a procedure.
But for having a separated HBCK2 repo outside the main code repo, well, I
do not see big advantages and it will introduce more problems.

And I also agree that for some types of operations we do not need to have
an active cluster or master. But we have decided to do so in the past and
introduced a maintenance mode, which cost me a lot of time when I wanted to
move balancer code to a sub module and decouple HMaster and HRegionServer.
And still, the decision is not by me...

So in general, I agree with most of your points. What I want to say is, we
have decided to go one way in the past, if we want to break it, we need to
review the old decision to see whether it is safe for us to break it, maybe
we will fall into another hole right after we jump out the current one...

And on how to implement HBCK2, I made a mistake on having HBCK2 depend on
SNAPSHOT HBase, technically there is no problem, but when we want to make a
release, this is not allowed by the ASF release rules...

Thanks.

Josh Elser <els...@apache.org> 于2022年3月2日周三 05:49写道:

> I tend to lean towards what Andrew is saying here, but I will also admit
> that this is in part from not having a good user-experience about
> getting up an HMaster in maintenance mode to do surgical stuff (feels
> like two steps instead of just one).
>
> Naively, rebuilding the SFT meta files from the filesystem doesn't
> require the HMaster to be up because there isn't any other "state" to
> consider (which was a big reason behind pushing the work that hbck2 was
> doing into the active master to avoid split-brain).
>
> Is doing logic in HBCK2 that doesn't talk to the HMaster a -1 from you,
> Duo? Similarly, is a utility in hbase-operator-tools (not a part of the
> hbck2 wrapper command) also a -1?
>
> Either are feasible, but I do think trying to build this SFT
> rebuilding/recovery into a maintenance-mode HMaster will be more work.
>
> On 2/21/22 12:27 PM, Andrew Purtell wrote:
> > There are some recovery cases where the cluster cannot be expected to be
> up
> > and running. What happens if we have no tooling for those? The user has a
> > dead cluster. So I don't think a requirement that the cluster be up and
> > running always is sufficient. For this type of recovery operator-tools
> must
> > be able to parse and write on disk formats. On the other hand hopefully
> the
> > cases for which that is not true are rare. In HBase 1, we had
> > OffineMetaRebuild. For my operations occasionally it has been necessary,
> in
> > test environments especially where users are not always clueful, and it
> has
> > shortened incident time from many hours to less than one hour. The
> > alternative would have been rebuild from scratch with total data loss,
> > which is a totally unsatisfying user experience.
> >
> >
> > On Sun, Feb 20, 2022 at 4:29 AM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
> >
> >> Sorry a bit late...
> >>
> >> IIRC, the design of HBCK2 is that, most of the actual fix logic should
> be
> >> done inside hbase(usually as a procedure), and the hbase-operator-tools
> is
> >> just a facade for calling these methods. It will query the cluster to
> find
> >> out which features are supportted. So in general, the design here is to
> >> always have the cluster up when fixing. We have a maintenance mode
> where we
> >> will just bring up HMaster and make meta table online, without loading
> any
> >> other regions.
> >>
> >> So I prefer we just use snapshot dependencies of hbase in HBCK2. It is
> not
> >> a big deal for end users as if we have not make the release yet, the new
> >> fixing options can never be actually used against a production cluster.
> >>
> >> Anyway, this means we need to publish nightly builds then.
> >>
> >> Thanks.
> >>
> >> Peter Somogyi <psomo...@apache.org> 于2022年2月18日周五 06:40写道:
> >>
> >>> Makes sense. Thanks Andrew for clarifying!
> >>>
> >>> On Thu, Feb 17, 2022, 21:28 Andrew Purtell <apurt...@apache.org>
> wrote:
> >>>
> >>>> On Thu, Feb 17, 2022 at 12:19 PM Peter Somogyi <psomo...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I like the idea of including the store file tracking in 2.5.0 to
> >>> unblock
> >>>>> the HBCK development efforts.
> >>>>>
> >>>>> Unfortunately, I was not following its development that much. Can it
> >>>> cause
> >>>>> any issues if 2.5.0 has the feature but later an incompatible change
> >> is
> >>>>> needed for SFT? Can it be marked as a beta feature where we are free
> >> to
> >>>>> modify interfaces?
> >>>>>
> >>>>
> >>>> Yes, this is what I meant when I suggested we could mark it as
> >>>> 'experimental'. We have done this in the past. The word 'experimental'
> >> is
> >>>> prominently included adjacent to any discussion of the feature in
> >>>> documentation and release notes. When we feel for sure it is stable
> >> that
> >>>> word is removed. We can do something different this time of course but
> >>> that
> >>>> has been our past practice when introducing new functionality into
> >>>> releasing code lines. And I presume we would use the Evolving
> interface
> >>>> annotation everywhere.
> >>>>
> >>>> Peter
> >>>>>
> >>>>> On Tue, Feb 15, 2022 at 11:07 PM Andrew Purtell <
> >>>> andrew.purt...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Another option which I do not see mentioned yet is to extract the
> >>>>> relevant
> >>>>>> common proto and source files from the ‘hbase’ repository into a
> >> new
> >>>>>> repository (‘hbase-storage’?), from which we would release
> >> artifacts
> >>> to
> >>>>> be
> >>>>>> consumed by both hbase and hbase-operator-tools. This maintains
> >>> D.R.Y.
> >>>>>> through refactoring although it may down the road cause some
> >>> complexity
> >>>>> in
> >>>>>> coordinating evolution among the three (if not more) repositories
> >> and
> >>>>>> releases produced from them. This is like Josh’s Option 1 but
> >> without
> >>>>>> duplication.
> >>>>>>
> >>>>>> Regarding the option 2 issue… If it would help we can drop SFT into
> >>>>>> branch-2.5 along with the log4j2 changes and release 2.5.0
> >> afterward.
> >>>> We
> >>>>>> are taking the opportunity of this minor increment to accelerate
> >>> log4j1
> >>>>>> retirement, which is why it’s still waiting (but not for long). We
> >>> can
> >>>>> use
> >>>>>> the same opportunity to release SFT even if we designate it as an
> >>>>>> experimental feature if that would simplify some other logistics.
> >> For
> >>>>> what
> >>>>>> it’s worth.
> >>>>>>
> >>>>>>> On Feb 15, 2022, at 7:44 AM, Josh Elser <els...@apache.org>
> >> wrote:
> >>>>>>>
> >>>>>>> I was talking with Szabolcs prior to him sending this one, and
> >>> it's
> >>>> a
> >>>>>> tricky issue for sure.
> >>>>>>>
> >>>>>>> To date, we've solved any HBase API issues by copying code into
> >>> HBCK2
> >>>>>> e.g. HBCKMetaTableAccessor which copies parts of MetaTableAccessor,
> >>> or
> >>>> we
> >>>>>> push the logic down server-side to the HBase Master and invoke it
> >>> over
> >>>>> the
> >>>>>> Hbck RPC interface.
> >>>>>>>
> >>>>>>> I definitely want to avoid HBase version specific builds of the
> >>>>>> operator-tools, so that is not an option in my mind for 2.x. The
> >>>>>> discussions we had (that I remember) around HBCK2 were limited in
> >>> scope
> >>>>> to
> >>>>>> HBase 2.x.
> >>>>>>>
> >>>>>>> Option 1: we copy the necessary proto files from HBase into the
> >>>>>> operator-tools and try to remember that, if we make any change to
> >> the
> >>>>>> serialization of the storefile list files, we have to copy that
> >>> change
> >>>> to
> >>>>>> HBCK2. Brittle on the surface but effective.
> >>>>>>>
> >>>>>>> Option 2: We bump HBCK2 to hbase-2.6.0-SNAPSHOT. Problematic
> >> until
> >>> we
> >>>>>> make an HBase 2.6.0[-alpha] release. We should already have wire
> >>> compat
> >>>>>> between all of HBase 2.x which makes that a non-issue.
> >>>>>>>
> >>>>>>> Option 3: We create an HBCK3 targeted for HBase 3.x. I'm not
> >>>> convinced
> >>>>>> we need to do that (hbck for hbase 3.x would be just like hbck for
> >>>> hbase
> >>>>>> 2.x). This would also not solve the problem for the SFT feature in
> >>>> hbase
> >>>>>> 2.6.
> >>>>>>>
> >>>>>>> I think option 3 is a no-go. I am leaning towards option 1 at
> >> this
> >>>>>> point. Hopefully my thought process is helpful for others to weigh
> >>> in.
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 2/14/22 11:31 AM, Szabolcs Bukros wrote:
> >>>>>>>> Hi Folks!
> >>>>>>>> While working on adding tools to handle potential FileBased
> >>>>>>>> StoreFileTracker issues to HBCK2 (HBASE-26624
> >>>>>>>> <https://issues.apache.org/jira/browse/HBASE-26624>) I ran into
> >>>>>> multiple
> >>>>>>>> problems I'm unsure how to solve.
> >>>>>>>> First of all the tools would rely on files not yet available in
> >>> any
> >>>> of
> >>>>>> the
> >>>>>>>> released hbase artifacts. I tried to solve this without changing
> >>> the
> >>>>>> hbase
> >>>>>>>> dependency version to keep HBCK2 as hbase version independent as
> >>>>>> possible,
> >>>>>>>> but none of the solutions I have found looked acceptable:
> >>>>>>>>   - Pushing the logic to the hbase side (as far as I can tell) is
> >>> not
> >>>>>>>> feasible because it has to be able to repair meta which is
> >> easier
> >>>> when
> >>>>>>>> hbase is down and the tool should be able to run without a
> >> working
> >>>>>> hbase.
> >>>>>>>>   - The files tracking the store content are serialized proto
> >>> objects
> >>>>> so
> >>>>>>>> while replicating those files in the operator tools is possible,
> >>> it
> >>>>>> would
> >>>>>>>> not be pretty.
> >>>>>>>> Bumping operator tools to use hbase 2.6.0-SNAPSHOT (branch-2 has
> >>> the
> >>>>> SFT
> >>>>>>>> changes) would mean that now we need that or a newer version to
> >>>> build
> >>>>>> the
> >>>>>>>> project and a version check to avoid runtime problems with the
> >> new
> >>>>>> tools,
> >>>>>>>> but otherwise this looks rather painless and backwards
> >>> compatible. I
> >>>>>> know
> >>>>>>>> operator tools tries to avoid having a hbase-specific release,
> >> but
> >>>>>> having
> >>>>>>>> 2.6 as a min version to build against might be acceptable.
> >>>>>>>> While looking into this I also checked what needs to be done to
> >>> make
> >>>>>>>> operator tools work with hbase 3.0.0-alpha-3-SNAPSHOT. Most of
> >> the
> >>>>>> changes
> >>>>>>>> are backwards compatible but not all of them and the ones that
> >>>> aren't
> >>>>>> would
> >>>>>>>> make a big chunk of Fsck unusable with older hbases. For me that
> >>>> looks
> >>>>>>>> acceptable since this is a major version change, but that would
> >>>> mean I
> >>>>>> can
> >>>>>>>> not rely on a potential HBCK3 to fix SFT issues, I would also
> >>> need a
> >>>>>>>> solution for HBCK2.
> >>>>>>>> I tried to look for plans/direction regarding the new 1.3
> >> operator
> >>>>> tools
> >>>>>>>> but could not find any.
> >>>>>>>> Do you think it would be possible to bump the hbase version it
> >>> uses
> >>>> to
> >>>>>>>> 2.6.0-SNAPSHOT?
> >>>>>>>> Do you think it would make sense to start working on a hbase3
> >>>>> compatible
> >>>>>>>> branch or is it too early?
> >>>>>>>> NOTE:
> >>>>>>>> I'm aware hbase does not publish SNAPSHOT builds for years, but
> >> I
> >>> do
> >>>>> not
> >>>>>>>> know how the internal build system works and if these artifacts
> >>>> would
> >>>>> be
> >>>>>>>> available for internal builds or not. I also do not know if
> >>>> necessary
> >>>>>> could
> >>>>>>>> they be made available.
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Andrew
> >>>>
> >>>> Unrest, ignorance distilled, nihilistic imbeciles -
> >>>>      It's what we’ve earned
> >>>> Welcome, apocalypse, what’s taken you so long?
> >>>> Bring us the fitting end that we’ve been counting on
> >>>>     - A23, Welcome, Apocalypse
> >>>>
> >>>
> >>
> >
> >
>

Reply via email to