+1, similar to what was done in the past for using
HdfsDataOutputStreamBuilder that was available since hadoop 2.9 or so I
think.


On Thu, Mar 16, 2023 at 1:04 PM Andrew Purtell <[email protected]>
wrote:

> It should be done with reflection rather than take a direct dependency,
> until Hadoop common interfaces are available in what we consider the lowest
> supported version.
>
> > On Mar 16, 2023, at 12:35 PM, Viraj Jasani <[email protected]> wrote:
> >
> > It would be nice using PathCapabilities to determine lease recovery as a
> > feature flag.
> > In fact, s3a and abfs have lots of feature flags being derived from this
> > API already. It would be good for dfs and ozone to recognize lease
> recovery
> > as a capability.
> >
> > However, this alone might not be sufficient and something like
> > RecoverableFileSystem interface would be helpful as long as we can
> abstract
> > out lease recovery (and safe mode etc) options as hbase anyways need to
> > perform them.
> >
> > Hence, having both: a) path capability to identify if lease recovery etc
> > features are available and b) a new FileSystem interface that both dfs
> and
> > ozone can implement, would be great IMHO. Because even if we just have
> path
> > capability for the feature flag, we would still end up adding ozone
> > dependency (unless done with reflection as Andrew mentioned) to perform
> > lease recovery unless lease recovery is abstracted out somewhere in
> hadoop.
> >
> >> One of the original worries is if the Hadoop/HDFS community
> >> would reject our proposal when we change the base interface/abstract
> class
> >> in FileSystem (if it's non-backward compatible).
> >
> > I believe, new IA.Public interface in hadoop that can abstract out lease
> > recovery etc would have less likelihood of getting rejected than "making
> > changes in FileSystem directly".
> >
> >
> >> On Thu, Mar 16, 2023 at 2:07 AM Tak Lon (Stephen) Wu <[email protected]
> >
> >> wrote:
> >>
> >> In addition, I'm yet confirm but based on another search in the hadoop
> >> code, we may be able to add recover lease as a feature flag in
> >> CommonPathCapabilities [3] and can be used by the interface of
> >> PathCapabilities#hasPathCapability [4]. (this is similar to
> >> StreamCapabilities as mentioned by Viraj)
> >>
> >> 3.
> >>
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonPathCapabilities.java
> >> 4.
> >>
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java
> >>
> >> -Stephen
> >>
> >>> On Thu, Mar 16, 2023 at 12:00 AM Tak Lon (Stephen) Wu <
> [email protected]>
> >>> wrote:
> >>>
> >>> Thanks everyone ! Sean helped to clarify that something like DFS
> specific
> >>> APIs used by HBase has been in-place in many HBase modules as the
> feature
> >>> implementation but yet standardized in hadoop general FileSystem API,
> >> e.g.
> >>> lease recovery. One of the original worries is if the Hadoop/HDFS
> >> community
> >>> would reject our proposal when we change the base interface/abstract
> >> class
> >>> in FileSystem (if it's non-backward compatible). The discussion here
> >> helps
> >>> to confirm the direction, and let's see how we can make it generic and
> >>> could help to avoid confusion in both places.
> >>>
> >>> Thanks again,
> >>> Stephen
> >>>
> >>> On Wed, Mar 15, 2023 at 2:54 PM Andrew Purtell <
> [email protected]
> >>>
> >>> wrote:
> >>>
> >>>> Then Hadoop should add one and although we would need a reflection
> >> based
> >>>> check in the interim we can converge toward the ideal.
> >>>>
> >>>> In any case I believe we can avoid a direct dependency on Ozone and
> >> should
> >>>> strongly avoid taking such unnecessary dependencies. The Hadoop and
> >> HBase
> >>>> build dependency sets are already very large and we and other users
> are
> >>>> being hit with significant security issue remediation work, much of
> >> which
> >>>> represents compatibility problems and is not upstreamable (like
> >> protobuf 2
> >>>> removal in 2.x). We struggle with the existing dependencies enough
> >> already
> >>>> at my employer.
> >>>>
> >>>>> On Mar 15, 2023, at 1:53 PM, Sean Busbey <[email protected]> wrote:
> >>>>>
> >>>>> the check that Stephen is referring to is for logic around lease
> >>>> recovery
> >>>>> and not stream flush/sync. the lease recovery is specific to DFS
> >> IIRC and
> >>>>> doesn't have a FileSystem marker.
> >>>>>
> >>>>>> On Wed, Mar 15, 2023 at 3:22 PM Andrew Purtell <[email protected]
> >>>
> >>>> wrote:
> >>>>>>
> >>>>>> So we can test StreamCapabilities in code, in worst case by wrapping
> >>>> some
> >>>>>> probe code during startup with try-catch and examining the
> >> exception.
> >>>>>>
> >>>>>>> On Wed, Mar 15, 2023 at 1:09 PM Viraj Jasani <[email protected]>
> >>>> wrote:
> >>>>>>>
> >>>>>>> As of today, both WAL impl (fshlog and asyncfs) throw
> >>>>>>> StreamLacksCapabilityException if the FS Data OutputStream probe
> >> fails
> >>>>>> for
> >>>>>>> Hflush/Hsync:
> >>>>>>>
> >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HFLUSH)
> >>>>>>> and
> >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HSYNC)
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Mar 15, 2023 at 12:51 PM Andrew Purtell <
> >> [email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Does Hadoop have a marker interface that lets an application know
> >> its
> >>>>>>>> FileSystem instances can support hsync/hflush? Ideally all we
> >> should
> >>>>>> need
> >>>>>>>> to do is test with instanceof for that marker and use reflection
> >> (in
> >>>>>> the
> >>>>>>>> worst case) to get a handle to the hsync or hflush method, and
> >> then
> >>>>>> call
> >>>>>>>> it. This approach should be taken wherever we have a requirement
> >> to
> >>>>>> use a
> >>>>>>>> special WAL specific API provided by the underlying FileSystem,
> >> so we
> >>>>>> can
> >>>>>>>> abstract it sufficiently to not require a direct dependency on
> >> Ozone
> >>>> or
> >>>>>>> S3A
> >>>>>>>> or any non HDFS filesystem.
> >>>>>>>>
> >>>>>>>> On Wed, Mar 15, 2023 at 12:31 PM Tak Lon (Stephen) Wu <
> >>>>>> [email protected]
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi team,
> >>>>>>>>>
> >>>>>>>>> Recently, Wei-Chiu and I have been discussing about if HBase can
> >> use
> >>>>>>>>> Ozone as another storage as WAL (see the hsync and hflush JIRAs
> >> [1])
> >>>>>>>>> and HFile, for HFile it’s pluggable by configuring the file
> >> system to
> >>>>>>>>> use Ozone File System (Ozone)
> >>>>>>>>>
> >>>>>>>>> But we found that the WAL it’s a bit different, especially
> >>>>>>>>> RecoverLeaseFSUtils#recoverFileLease [2], it has one check about
> >> if
> >>>>>>>>> the file system is an instance of HDFS, and thus WAL recovery to
> >>>>>>>>> execute file lease recovery from RS crashes. Here, if we would
> >> like
> >>>>>> to
> >>>>>>>>> add Ozone, it does not matter by importing as a direct
> >> dependency to
> >>>>>>>>> perform similar lease recovery or via reflection by class name in
> >>>>>>>>> plaintext String, we still need to somehow introduce Ozone to be
> >>>>>>>>> another supported file system. (we can discuss how we can
> >> implement
> >>>>>>>>> better as well)
> >>>>>>>>>
> >>>>>>>>> We also found other places e.g. FSUtils and HFileSystem have used
> >>>>>>>>> DistributedFileSystem, but it should be able to move them into
> >> either
> >>>>>>>>> hbase-asyncfs or a new FS related component to separate the use
> >> of
> >>>>>>>>> different supported file systems.
> >>>>>>>>>
> >>>>>>>>> So, we’re wondering if anyone would have any objections to adding
> >>>>>>>>> Ozone as a dependency to hbase-asyncfs? or if you have a better
> >> idea
> >>>>>>>>> how this could be added without adding Ozone as dependency,
> >> please
> >>>>>>>>> feel free to comment on this thread.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [1] Ozone is working on support for hsync and hflush,
> >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-7593,
> >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-4353
> >>>>>>>>> [2] RecoverLeaseFSUtils#recoverFileLease,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L53-L63
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Stephen
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
> >
> > --
> > Thanks,
> > Viraj
>

Reply via email to