How do we support S3 as HFile storage currently? I do not think we have
added aws-sdk as a direct dependency in HBase now?

Viraj Jasani <[email protected]> 于2023年3月17日周五 04:37写道:

> +1, similar to what was done in the past for using
> HdfsDataOutputStreamBuilder that was available since hadoop 2.9 or so I
> think.
>
>
> On Thu, Mar 16, 2023 at 1:04 PM Andrew Purtell <[email protected]>
> wrote:
>
> > It should be done with reflection rather than take a direct dependency,
> > until Hadoop common interfaces are available in what we consider the
> lowest
> > supported version.
> >
> > > On Mar 16, 2023, at 12:35 PM, Viraj Jasani <[email protected]> wrote:
> > >
> > > It would be nice using PathCapabilities to determine lease recovery
> as a
> > > feature flag.
> > > In fact, s3a and abfs have lots of feature flags being derived from
> this
> > > API already. It would be good for dfs and ozone to recognize lease
> > recovery
> > > as a capability.
> > >
> > > However, this alone might not be sufficient and something like
> > > RecoverableFileSystem interface would be helpful as long as we can
> > abstract
> > > out lease recovery (and safe mode etc) options as hbase anyways need to
> > > perform them.
> > >
> > > Hence, having both: a) path capability to identify if lease recovery
> etc
> > > features are available and b) a new FileSystem interface that both dfs
> > and
> > > ozone can implement, would be great IMHO. Because even if we just have
> > path
> > > capability for the feature flag, we would still end up adding ozone
> > > dependency (unless done with reflection as Andrew mentioned) to perform
> > > lease recovery unless lease recovery is abstracted out somewhere in
> > hadoop.
> > >
> > >> One of the original worries is if the Hadoop/HDFS community
> > >> would reject our proposal when we change the base interface/abstract
> > class
> > >> in FileSystem (if it's non-backward compatible).
> > >
> > > I believe, new IA.Public interface in hadoop that can abstract out
> lease
> > > recovery etc would have less likelihood of getting rejected than
> "making
> > > changes in FileSystem directly".
> > >
> > >
> > >> On Thu, Mar 16, 2023 at 2:07 AM Tak Lon (Stephen) Wu <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >> In addition, I'm yet confirm but based on another search in the hadoop
> > >> code, we may be able to add recover lease as a feature flag in
> > >> CommonPathCapabilities [3] and can be used by the interface of
> > >> PathCapabilities#hasPathCapability [4]. (this is similar to
> > >> StreamCapabilities as mentioned by Viraj)
> > >>
> > >> 3.
> > >>
> >
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonPathCapabilities.java
> > >> 4.
> > >>
> >
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java
> > >>
> > >> -Stephen
> > >>
> > >>> On Thu, Mar 16, 2023 at 12:00 AM Tak Lon (Stephen) Wu <
> > [email protected]>
> > >>> wrote:
> > >>>
> > >>> Thanks everyone ! Sean helped to clarify that something like DFS
> > specific
> > >>> APIs used by HBase has been in-place in many HBase modules as the
> > feature
> > >>> implementation but yet standardized in hadoop general FileSystem API,
> > >> e.g.
> > >>> lease recovery. One of the original worries is if the Hadoop/HDFS
> > >> community
> > >>> would reject our proposal when we change the base interface/abstract
> > >> class
> > >>> in FileSystem (if it's non-backward compatible). The discussion here
> > >> helps
> > >>> to confirm the direction, and let's see how we can make it generic
> and
> > >>> could help to avoid confusion in both places.
> > >>>
> > >>> Thanks again,
> > >>> Stephen
> > >>>
> > >>> On Wed, Mar 15, 2023 at 2:54 PM Andrew Purtell <
> > [email protected]
> > >>>
> > >>> wrote:
> > >>>
> > >>>> Then Hadoop should add one and although we would need a reflection
> > >> based
> > >>>> check in the interim we can converge toward the ideal.
> > >>>>
> > >>>> In any case I believe we can avoid a direct dependency on Ozone and
> > >> should
> > >>>> strongly avoid taking such unnecessary dependencies. The Hadoop and
> > >> HBase
> > >>>> build dependency sets are already very large and we and other users
> > are
> > >>>> being hit with significant security issue remediation work, much of
> > >> which
> > >>>> represents compatibility problems and is not upstreamable (like
> > >> protobuf 2
> > >>>> removal in 2.x). We struggle with the existing dependencies enough
> > >> already
> > >>>> at my employer.
> > >>>>
> > >>>>> On Mar 15, 2023, at 1:53 PM, Sean Busbey <[email protected]>
> wrote:
> > >>>>>
> > >>>>> the check that Stephen is referring to is for logic around lease
> > >>>> recovery
> > >>>>> and not stream flush/sync. the lease recovery is specific to DFS
> > >> IIRC and
> > >>>>> doesn't have a FileSystem marker.
> > >>>>>
> > >>>>>> On Wed, Mar 15, 2023 at 3:22 PM Andrew Purtell <
> [email protected]
> > >>>
> > >>>> wrote:
> > >>>>>>
> > >>>>>> So we can test StreamCapabilities in code, in worst case by
> wrapping
> > >>>> some
> > >>>>>> probe code during startup with try-catch and examining the
> > >> exception.
> > >>>>>>
> > >>>>>>> On Wed, Mar 15, 2023 at 1:09 PM Viraj Jasani <[email protected]
> >
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>> As of today, both WAL impl (fshlog and asyncfs) throw
> > >>>>>>> StreamLacksCapabilityException if the FS Data OutputStream probe
> > >> fails
> > >>>>>> for
> > >>>>>>> Hflush/Hsync:
> > >>>>>>>
> > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HFLUSH)
> > >>>>>>> and
> > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HSYNC)
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Mar 15, 2023 at 12:51 PM Andrew Purtell <
> > >> [email protected]>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Does Hadoop have a marker interface that lets an application
> know
> > >> its
> > >>>>>>>> FileSystem instances can support hsync/hflush? Ideally all we
> > >> should
> > >>>>>> need
> > >>>>>>>> to do is test with instanceof for that marker and use reflection
> > >> (in
> > >>>>>> the
> > >>>>>>>> worst case) to get a handle to the hsync or hflush method, and
> > >> then
> > >>>>>> call
> > >>>>>>>> it. This approach should be taken wherever we have a requirement
> > >> to
> > >>>>>> use a
> > >>>>>>>> special WAL specific API provided by the underlying FileSystem,
> > >> so we
> > >>>>>> can
> > >>>>>>>> abstract it sufficiently to not require a direct dependency on
> > >> Ozone
> > >>>> or
> > >>>>>>> S3A
> > >>>>>>>> or any non HDFS filesystem.
> > >>>>>>>>
> > >>>>>>>> On Wed, Mar 15, 2023 at 12:31 PM Tak Lon (Stephen) Wu <
> > >>>>>> [email protected]
> > >>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi team,
> > >>>>>>>>>
> > >>>>>>>>> Recently, Wei-Chiu and I have been discussing about if HBase
> can
> > >> use
> > >>>>>>>>> Ozone as another storage as WAL (see the hsync and hflush JIRAs
> > >> [1])
> > >>>>>>>>> and HFile, for HFile it’s pluggable by configuring the file
> > >> system to
> > >>>>>>>>> use Ozone File System (Ozone)
> > >>>>>>>>>
> > >>>>>>>>> But we found that the WAL it’s a bit different, especially
> > >>>>>>>>> RecoverLeaseFSUtils#recoverFileLease [2], it has one check
> about
> > >> if
> > >>>>>>>>> the file system is an instance of HDFS, and thus WAL recovery
> to
> > >>>>>>>>> execute file lease recovery from RS crashes. Here, if we would
> > >> like
> > >>>>>> to
> > >>>>>>>>> add Ozone, it does not matter by importing as a direct
> > >> dependency to
> > >>>>>>>>> perform similar lease recovery or via reflection by class name
> in
> > >>>>>>>>> plaintext String, we still need to somehow introduce Ozone to
> be
> > >>>>>>>>> another supported file system. (we can discuss how we can
> > >> implement
> > >>>>>>>>> better as well)
> > >>>>>>>>>
> > >>>>>>>>> We also found other places e.g. FSUtils and HFileSystem have
> used
> > >>>>>>>>> DistributedFileSystem, but it should be able to move them into
> > >> either
> > >>>>>>>>> hbase-asyncfs or a new FS related component to separate the use
> > >> of
> > >>>>>>>>> different supported file systems.
> > >>>>>>>>>
> > >>>>>>>>> So, we’re wondering if anyone would have any objections to
> adding
> > >>>>>>>>> Ozone as a dependency to hbase-asyncfs? or if you have a better
> > >> idea
> > >>>>>>>>> how this could be added without adding Ozone as dependency,
> > >> please
> > >>>>>>>>> feel free to comment on this thread.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> [1] Ozone is working on support for hsync and hflush,
> > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-7593,
> > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-4353
> > >>>>>>>>> [2] RecoverLeaseFSUtils#recoverFileLease,
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L53-L63
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Stephen
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>
> > >
> > >
> > > --
> > > Thanks,
> > > Viraj
> >
>

Reply via email to