How do we support S3 as HFile storage currently? I do not think we have added aws-sdk as a direct dependency in HBase now?
Viraj Jasani <[email protected]> 于2023年3月17日周五 04:37写道: > +1, similar to what was done in the past for using > HdfsDataOutputStreamBuilder that was available since hadoop 2.9 or so I > think. > > > On Thu, Mar 16, 2023 at 1:04 PM Andrew Purtell <[email protected]> > wrote: > > > It should be done with reflection rather than take a direct dependency, > > until Hadoop common interfaces are available in what we consider the > lowest > > supported version. > > > > > On Mar 16, 2023, at 12:35 PM, Viraj Jasani <[email protected]> wrote: > > > > > > It would be nice using PathCapabilities to determine lease recovery > as a > > > feature flag. > > > In fact, s3a and abfs have lots of feature flags being derived from > this > > > API already. It would be good for dfs and ozone to recognize lease > > recovery > > > as a capability. > > > > > > However, this alone might not be sufficient and something like > > > RecoverableFileSystem interface would be helpful as long as we can > > abstract > > > out lease recovery (and safe mode etc) options as hbase anyways need to > > > perform them. > > > > > > Hence, having both: a) path capability to identify if lease recovery > etc > > > features are available and b) a new FileSystem interface that both dfs > > and > > > ozone can implement, would be great IMHO. Because even if we just have > > path > > > capability for the feature flag, we would still end up adding ozone > > > dependency (unless done with reflection as Andrew mentioned) to perform > > > lease recovery unless lease recovery is abstracted out somewhere in > > hadoop. > > > > > >> One of the original worries is if the Hadoop/HDFS community > > >> would reject our proposal when we change the base interface/abstract > > class > > >> in FileSystem (if it's non-backward compatible). > > > > > > I believe, new IA.Public interface in hadoop that can abstract out > lease > > > recovery etc would have less likelihood of getting rejected than > "making > > > changes in FileSystem directly". > > > > > > > > >> On Thu, Mar 16, 2023 at 2:07 AM Tak Lon (Stephen) Wu < > [email protected] > > > > > >> wrote: > > >> > > >> In addition, I'm yet confirm but based on another search in the hadoop > > >> code, we may be able to add recover lease as a feature flag in > > >> CommonPathCapabilities [3] and can be used by the interface of > > >> PathCapabilities#hasPathCapability [4]. (this is similar to > > >> StreamCapabilities as mentioned by Viraj) > > >> > > >> 3. > > >> > > > https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonPathCapabilities.java > > >> 4. > > >> > > > https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java > > >> > > >> -Stephen > > >> > > >>> On Thu, Mar 16, 2023 at 12:00 AM Tak Lon (Stephen) Wu < > > [email protected]> > > >>> wrote: > > >>> > > >>> Thanks everyone ! Sean helped to clarify that something like DFS > > specific > > >>> APIs used by HBase has been in-place in many HBase modules as the > > feature > > >>> implementation but yet standardized in hadoop general FileSystem API, > > >> e.g. > > >>> lease recovery. One of the original worries is if the Hadoop/HDFS > > >> community > > >>> would reject our proposal when we change the base interface/abstract > > >> class > > >>> in FileSystem (if it's non-backward compatible). The discussion here > > >> helps > > >>> to confirm the direction, and let's see how we can make it generic > and > > >>> could help to avoid confusion in both places. > > >>> > > >>> Thanks again, > > >>> Stephen > > >>> > > >>> On Wed, Mar 15, 2023 at 2:54 PM Andrew Purtell < > > [email protected] > > >>> > > >>> wrote: > > >>> > > >>>> Then Hadoop should add one and although we would need a reflection > > >> based > > >>>> check in the interim we can converge toward the ideal. > > >>>> > > >>>> In any case I believe we can avoid a direct dependency on Ozone and > > >> should > > >>>> strongly avoid taking such unnecessary dependencies. The Hadoop and > > >> HBase > > >>>> build dependency sets are already very large and we and other users > > are > > >>>> being hit with significant security issue remediation work, much of > > >> which > > >>>> represents compatibility problems and is not upstreamable (like > > >> protobuf 2 > > >>>> removal in 2.x). We struggle with the existing dependencies enough > > >> already > > >>>> at my employer. > > >>>> > > >>>>> On Mar 15, 2023, at 1:53 PM, Sean Busbey <[email protected]> > wrote: > > >>>>> > > >>>>> the check that Stephen is referring to is for logic around lease > > >>>> recovery > > >>>>> and not stream flush/sync. the lease recovery is specific to DFS > > >> IIRC and > > >>>>> doesn't have a FileSystem marker. > > >>>>> > > >>>>>> On Wed, Mar 15, 2023 at 3:22 PM Andrew Purtell < > [email protected] > > >>> > > >>>> wrote: > > >>>>>> > > >>>>>> So we can test StreamCapabilities in code, in worst case by > wrapping > > >>>> some > > >>>>>> probe code during startup with try-catch and examining the > > >> exception. > > >>>>>> > > >>>>>>> On Wed, Mar 15, 2023 at 1:09 PM Viraj Jasani <[email protected] > > > > >>>> wrote: > > >>>>>>> > > >>>>>>> As of today, both WAL impl (fshlog and asyncfs) throw > > >>>>>>> StreamLacksCapabilityException if the FS Data OutputStream probe > > >> fails > > >>>>>> for > > >>>>>>> Hflush/Hsync: > > >>>>>>> > > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HFLUSH) > > >>>>>>> and > > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HSYNC) > > >>>>>>> > > >>>>>>> > > >>>>>>> On Wed, Mar 15, 2023 at 12:51 PM Andrew Purtell < > > >> [email protected]> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Does Hadoop have a marker interface that lets an application > know > > >> its > > >>>>>>>> FileSystem instances can support hsync/hflush? Ideally all we > > >> should > > >>>>>> need > > >>>>>>>> to do is test with instanceof for that marker and use reflection > > >> (in > > >>>>>> the > > >>>>>>>> worst case) to get a handle to the hsync or hflush method, and > > >> then > > >>>>>> call > > >>>>>>>> it. This approach should be taken wherever we have a requirement > > >> to > > >>>>>> use a > > >>>>>>>> special WAL specific API provided by the underlying FileSystem, > > >> so we > > >>>>>> can > > >>>>>>>> abstract it sufficiently to not require a direct dependency on > > >> Ozone > > >>>> or > > >>>>>>> S3A > > >>>>>>>> or any non HDFS filesystem. > > >>>>>>>> > > >>>>>>>> On Wed, Mar 15, 2023 at 12:31 PM Tak Lon (Stephen) Wu < > > >>>>>> [email protected] > > >>>>>>>> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Hi team, > > >>>>>>>>> > > >>>>>>>>> Recently, Wei-Chiu and I have been discussing about if HBase > can > > >> use > > >>>>>>>>> Ozone as another storage as WAL (see the hsync and hflush JIRAs > > >> [1]) > > >>>>>>>>> and HFile, for HFile it’s pluggable by configuring the file > > >> system to > > >>>>>>>>> use Ozone File System (Ozone) > > >>>>>>>>> > > >>>>>>>>> But we found that the WAL it’s a bit different, especially > > >>>>>>>>> RecoverLeaseFSUtils#recoverFileLease [2], it has one check > about > > >> if > > >>>>>>>>> the file system is an instance of HDFS, and thus WAL recovery > to > > >>>>>>>>> execute file lease recovery from RS crashes. Here, if we would > > >> like > > >>>>>> to > > >>>>>>>>> add Ozone, it does not matter by importing as a direct > > >> dependency to > > >>>>>>>>> perform similar lease recovery or via reflection by class name > in > > >>>>>>>>> plaintext String, we still need to somehow introduce Ozone to > be > > >>>>>>>>> another supported file system. (we can discuss how we can > > >> implement > > >>>>>>>>> better as well) > > >>>>>>>>> > > >>>>>>>>> We also found other places e.g. FSUtils and HFileSystem have > used > > >>>>>>>>> DistributedFileSystem, but it should be able to move them into > > >> either > > >>>>>>>>> hbase-asyncfs or a new FS related component to separate the use > > >> of > > >>>>>>>>> different supported file systems. > > >>>>>>>>> > > >>>>>>>>> So, we’re wondering if anyone would have any objections to > adding > > >>>>>>>>> Ozone as a dependency to hbase-asyncfs? or if you have a better > > >> idea > > >>>>>>>>> how this could be added without adding Ozone as dependency, > > >> please > > >>>>>>>>> feel free to comment on this thread. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> [1] Ozone is working on support for hsync and hflush, > > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-7593, > > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-4353 > > >>>>>>>>> [2] RecoverLeaseFSUtils#recoverFileLease, > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>> > > >> > > > https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L53-L63 > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Stephen > > >>>>>>> > > >>>>>> > > >>>> > > >> > > > > > > > > > -- > > > Thanks, > > > Viraj > > >
