Thank you Istvan for the very detailed response! I will familiarize myself with the smoke tests.
On running HBase with HBase Hadoop version > Hadoop cluster version and our use of reflection, I looked for some examples in the codebase where reflection is used to check the existence of a new Hadoop serverside API that may be unsupported by the Hadoop cluster version to see how that is handled. Erasure coding support looks like a good example, where if the DFS class in the HBase server classpath contains a new API which the Hadoop server version does not support, the call to the new Hadoop API can be attempted and fail, and the failure looks to be handled early and cleanly in the verifySupport step - https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/ErasureCodingUtils.java#L89-L106 Given the discussion/changes in this thread and our reliance/trust in Hadoop compatibility promises, should the following Hadoop version guidance in the reference be updated to something along the lines of "Running HBase with a bundled Hadoop version which does not match the Hadoop cluster/server version is expected to be OK so long as it is a supported Hadoop version and there is Hadoop wire/API compatibility between the HBase bundled Hadoop version and the cluster Hadoop version" - https://hbase.apache.org/book.html#hadoop : "Replace the Hadoop Bundled With HBase! Because HBase depends on Hadoop, it bundles Hadoop jars under its lib directory. The bundled jars are ONLY for use in stand-alone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jars found in the HBase lib directory with the equivalent hadoop jars from the version you are running on your cluster to avoid version mismatch issues. Make sure you replace the jars under HBase across your whole cluster. Hadoop version mismatch issues have various manifestations. Check for mismatch if HBase appears hung." Version mismatch/Hadoop compatibility aside, my understanding is that Hadoop binary compatibility is not guaranteed because of our private Hadoop API usage, ref mailing list thread https://lists.apache.org/thread/gpcosvyv84lyc3qldjspbk11ywfqw59d , so the guidance in the reference to swap out the Hadoop jars without recompiling with the appropriate Hadoop version is incomplete, seems like we should mention that recompilation may be necessary. I would be happy to take on this documentation update if there is agreement that it could use an update. Thank you, Daniel From: dev@hbase.apache.org At: 11/18/24 03:43:16 UTC-5:00To: dev@hbase.apache.org Subject: Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions > > This is something that has been touched on but I think would be good to > make more explicit/document - do said Hadoop compatibility guarantees mean > that one does not have to upgrade their existing Hadoop cluster before they > upgrade to an HBase version with a default Hadoop 3.x version that is minor > versions ahead of their Hadoop cluster, e.g for an HBase 2.5.y -> 2.5.z > patch version upgrade, if Hadoop 3.4.1 is made the default for branch-2.5 > as proposed below and a user is currently running Hadoop 3.2.3? Or is that > something that may work, but in case of issues a user is expected to fall > back to compiling their own HBase with a lower Hadoop version as you > mentioned Istvan? I would go one step farther, and I would say that it is expected to work, based on our code and Hadoop's compatibility promises, and we do run nightly smoke tests for this exact case. If this does not work for any reason, then definitely file a JIRA, and we will try to resolve it, unless it is some fundamental bug which cannot be fixed on the HBase side. But yes, if you do run into problems because of HBase being built with a newer version then the easy short-term fix is to recompile the assembly with the older Hadoop. HBase is written to work with older HBase versions. Any new interfaces are accessed via reflection with fallback to the old behaviour. We also build HBase with the oldest supported Hadoop version during testing, which would reveal any source-code level incompatibilities. So we are covered from that angle. By far the first worst offender in using private interfaces is hbase-asyncfs, which effectively re-implements half the HDFS client code, and as such is extremely sensitive to Hadoop implementation changes. The recent problems here were not with older Hadoop versions, which are still covered by the same fallback code path, but with the newer ones, where we needed to adopt the code to the Hadoop 3.4 changes. If any other private Hadoop interface usages have crept into HBase since the mentioned JIRA was closed, then any changes in those APIs would be caught in the CI tests, so I do not see those increasing compatibility risks. Having static analysis to catch those would still be nice. AFAIK the rest of the Hadoop-version dependent reflection-based Hadoop calls are far less problematic, and simply cover incompatible changes in Hadoop APIs, or use functionality/classes/methods that are simply missing in older versions (like the recent Ozone-related changes) We also run much of the test suite with older Hadoop versions (though in that case the Hadoop client libraries are also from the same version) The least tested path is running against an older Hadoop cluster with the new client classes. In this case Hadoop will run the new client code against the, and this is where we are most exposed to Hadoop compatibility implementation, as we depend on the new APIs working correctly with the old cluster (or more generally, on any Hadoop API working correctly with the old cluster) We do run some smoke tests for these cases, but they are not extensive, and only check that the cluster comes up, and basic CRUD operations work. As you mentioned, the problem is with the Hadoop assembly and shaded jars. If a user does run into a problem because of Hadoop breaking its compatibility promises, then it is because of the newer Hadoop libraries bundled with/shaded into HBase. Technically replacing the bundled Hadoop libraries in the assembly, and managing the Hadoop dependencies / using the -byo-hadoop shaded libraries would be enough. However, it is easier and less error-prone to just re-compile the whole assembly file with the older Hadoop than selective replacing JARS in the distributed version. For clients running/built outside the Hadoop assembly classpath, you can still use the official artifacts, and dependency manage the transitive Hadoop dependency versions (see Phoenix as an example), or -preferably- use hbase-shaded-client-byo-hadoop or hbase-shaded-mapreduce with the (shaded) Hadoop client/mapreduce jars added separately. Istvan On Sun, Nov 17, 2024 at 10:51 PM Daniel Roudnitsky (BLOOMBERG/ 919 3RD A) < droudnits...@bloomberg.net> wrote: > It sounds like Hadoop compatibility guarantees should make it simpler/less > risky for the default Hadoop version used by HBase to be bumped up. Some > questions that come to mind, how much complication/risk is introduced by > HBase’s use of private Hadoop interfaces that have weaker (or nonexistent?) > Hadoop compatibility guarantees across Hadoop minor/patch versions? I know > that AsyncFSWAL in particular is heavily dependent on private HDFS client > internals and for that reason is very Hadoop version sensitive with a > fallback to FSHLog in the case of compatibility issues detected at runtime > - https://hbase.apache.org/book.html#trouble.rs.startup.asyncfs > > This is something that has been touched on but I think would be good to > make more explicit/document - do said Hadoop compatibility guarantees mean > that one does not have to upgrade their existing Hadoop cluster before they > upgrade to an HBase version with a default Hadoop 3.x version that is minor > versions ahead of their Hadoop cluster, e.g for an HBase 2.5.y -> 2.5.z > patch version upgrade, if Hadoop 3.4.1 is made the default for branch-2.5 > as proposed below and a user is currently running Hadoop 3.2.3? Or is that > something that may work, but in case of issues a user is expected to fall > back to compiling their own HBase with a lower Hadoop version as you > mentioned Istvan? > > > > We also have the packaging > > > (smoke) tests for running the official binaries against earlier Hadoop > > > clusters, which would catch (some of) the Hadoop wire api > incompatibilities > > I am admittedly not familiar with the relevant HBase/Hadoop compatibility > testing that is done or its depth/breadth, and also don't know off the top > of my head of places outside of the WAL providers which make use of Hadoop > internals which I would think are particularly Hadoop version sensitive/at > risk. I stumbled on an older issue HBASE-13740 "Stop using Hadoop private > interfaces" where Busbey mentions animal sniffer as a possible tool to > detect the places where HBase is reliant on Hadoop private interfaces. The > places where reflection is used to make use of newer Hadoop features/APIs > with fallback to older behavior/APIs are a lot more obvious. > > Thank you, > Daniel > > ----- Original Message ----- > From: 张铎 <dev@hbase.apache.org> > To: dev@hbase.apache.org > At: 11/15/24 23:35:53 UTC-05:00 > > > +1 > > Istvan Toth <st...@cloudera.com.invalid> 于2024年11月15日周五 14:26写道: > > > > With Duo's fix for HBASE-28965 , there are no more known blockers. > > > > Should I go ahead with making 3.4.1 the default for branch-2.5 and > > branch-2.6 ? > > > > On Tue, Nov 5, 2024 at 7:16 AM Istvan Toth <st...@cloudera.com> wrote: > > > > > I've just committed the - hopefully - last test change, so the > nightlies > > > will not always fail on the packaging tests. > > > > > > All non-released branches (i.e. 2.7+ now default to building with > 3.4.1). > > > > > > I wanted to revisit updating the default version on the released (2.5, > > > 2.6) branches, because Nick has expressed concerns about it. > > > > > > The dev tests are good (the test failures seem to be "normal" > flakies). As > > > we've not updated the default version, we're not yet running the > packaging > > > tests > > > on branch-2.5 and 2.6 against the older Hadoop versions, but we do on > the > > > rest of branches, and if we update it, we will also run them on 2.5 > and 2.6. > > > > > > Without repeating the arguments I have already made for it, I want to > add > > > a new one: > > > > > > Security and CVEs are getting more and more emphasis, which is great, > but > > > has some drawbacks. > > > While the proliferation of static analyzers leads to a lot of > frustrating > > > CVE witch hunts and false positives, the majority of users cannot > evaluate > > > the actual security impact, > > > or even if they can, they are tied by inflexible policies. > > > > > > We at Phoenix had recent security discussions with Trino, and ended up > > > having to dependencyManage the transitive Hadoop dependencies in our > shaded > > > uberjar to address their concerns. > > > (Which is an antipattern) > > > > > > While updating the Hadoop version in a patch release undoubtedly > increases > > > the risk of regressions, IMO we are protected by the Hadoop backwards > > > compatibility promises, and we have added a reasonable number of tests > to > > > catch any issues. I am confident in our test coverage when building > HBase > > > with any of the supported HBase versions. We also have the packaging > > > (smoke) tests for running the official binaries against earlier Hadoop > > > clusters, which would catch (some of) the Hadoop wire api > incompatibilities > > > . > > > > > > However, IMO not updating Hadoop is net negative to the project's > health, > > > as the binary (maven or assembly) releases are used primarily either by > > > other libraries interfacing with HBase, or by new users, POC clusters, > etc. > > > and these are the use cases where the transitive CVEs can prevent > projects > > > from adding (or maintaining as in the case of Trino) HBase support, or > > > discourage new users from adopting HBase. > > > > > > Obviously, I have a very skewed view of HBase users, but I think that > most > > > production HBase users either use a vendor or cloud provider version, > or > > > have enough in-house expertise to rebuild HBase with their Hadoop > version > > > if something goes wrong (despite the Hadoop backwards compatibility > > > promises) > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2024 at 12:41 PM Istvan Toth <st...@cloudera.com> > wrote: > > > > > >> Thanks! > > >> > > >> I will backport the test changes, but keep the default Hadoop version. > > >> > > >> We will have more information then. > > >> > > >> Istvan > > >> > > >> On Wed, Oct 30, 2024 at 10:22 AM Nick Dimiduk <ndimi...@apache.org> > > >> wrote: > > >> > > >>> On Mon, Oct 28, 2024 at 11:00 AM Istvan Toth > <st...@cloudera.com.invalid> > > >>> wrote: > > >>> > > > >>> > I have looked at branch-2.5, but the nightly looks off there, as it > > >>> runs > > >>> > the packaging tests with Hadoop 3.1.1, which it doesn't even > > >>> officially > > >>> > support anymore. > > >>> > > > >>> > What should we do with branch-2.5 ? > > >>> > > > >>> > I think that it would not be a lot of extra work to backport > > >>> everything, > > >>> > both the backwards compatibility tests and defaulting Hadoop to > 3.4.1. > > >>> > We just have to update the version in the pom, and add 3.2.4 to the > > >>> list of > > >>> > versions to test for backwards compatibility and integration (and > > >>> remove > > >>> > 3.1.1). > > >>> > > > >>> > I would prefer to have uniform tests and default to Hadoop 3.4.1 > on all > > >>> > active branches. > > >>> > Having a (few) final 2.5.x release(s) with tested Hadoop 3.4.x > support > > >>> may > > >>> > be useful for users for migration and CVE mitigation purposes. > > >>> > > > >>> > WDYT ? > > >>> > > >>> branch-2.5's default hadoop3 version is 3.2.4. That's a big > dependency > > >>> to change for a patch release. I don't think that we can get away > with > > >>> that change and maintain our compatibility obligations. I'm not up to > > >>> speed on the current state of CVEs for this older (EOL?) version, so > > >>> we have that dimension to consider. If the newer version is "drop-in" > > >>> compatible (and only if), then I have no issue with moving that > > >>> release line forward. Ultimately it's the release manager for 2.5 to > > >>> make a determination, so I defer to Andrew's assessment. > > >>> > > >>> I am in favor of backing-porting the improved testing coverage you've > > >>> added to branch-2.5. It would be great to understand if branch-2.5 > (1) > > >>> compiled against 3.2.4 will run on Hadoop 3.4.1 and (2) builds and > > >>> tests out on 3.4.1. That will give the more security-minded users > > >>> additional confidence in bumping their hadoop dependency on their > own. > > >>> > > >>> Thanks, > > >>> Nick > > >>> > > >>> > On Mon, Oct 21, 2024 at 6:54 PM Istvan Toth <st...@cloudera.com> > > >>> wrote: > > >>> > > > >>> > > We could also move the default to 3.4.1 directly. > > >>> > > We already test for 3.4.0 in the nightly job. > > >>> > > > > >>> > > On Mon, Oct 21, 2024 at 3:49 PM 张铎(Duo Zhang) < > palomino...@gmail.com > > >>> > > > >>> > > wrote: > > >>> > > > > >>> > >> And seems hadoop 3.4.1 is out. we could see whether to bump to > this > > >>> > >> version later? > > >>> > >> > > >>> > >> Istvan Toth <st...@cloudera.com.invalid> 于2024年10月21日周一 > 20:56写道: > > >>> > >> > > > >>> > >> > I have merged the new tests to the nightly Jenkins runs on > master. > > >>> > >> > > > >>> > >> > They have identified another 3.4.0 incompatibility: > > >>> > >> > HBASE-28929 < > https://issues.apache.org/jira/browse/HBASE-28929> > > >>> > >> > > > >>> > >> > I will hold off backporting the test changes until > HBASE-28929 is > > >>> > >> resolved. > > >>> > >> > > >>> > > > > >>> > > > > >>> > > -- > > >>> > > *István Tóth* | Sr. Staff Software Engineer > > >>> > > *Email*: st...@cloudera.com > > >>> > > cloudera.com <https://www.cloudera.com> > > >>> > > [image: Cloudera] <https://www.cloudera.com/> > > >>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> > [image: > > >>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> > [image: > > >>> > > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera > > > > >>> > > ------------------------------ > > >>> > > ------------------------------ > > >>> > > > > >>> > > > >>> > > > >>> > -- > > >>> > *István Tóth* | Sr. Staff Software Engineer > > >>> > *Email*: st...@cloudera.com > > >>> > cloudera.com <https://www.cloudera.com> > > >>> > [image: Cloudera] <https://www.cloudera.com/> > > >>> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> > [image: > > >>> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > >>> Cloudera > > >>> > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > >>> > ------------------------------ > > >>> > ------------------------------ > > >>> > > >> > > >> > > >> -- > > >> *István Tóth* | Sr. Staff Software Engineer > > >> *Email*: st...@cloudera.com > > >> cloudera.com <https://www.cloudera.com> > > >> [image: Cloudera] <https://www.cloudera.com/> > > >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > >> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > > >> ------------------------------ > > >> ------------------------------ > > >> > > > > > > > > > -- > > > *István Tóth* | Sr. Staff Software Engineer > > > *Email*: st...@cloudera.com > > > cloudera.com <https://www.cloudera.com> > > > [image: Cloudera] <https://www.cloudera.com/> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > ------------------------------ > > > ------------------------------ > > > > > > > > > -- > > *István Tóth* | Sr. Staff Software Engineer > > *Email*: st...@cloudera.com > > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > ------------------------------ > > ------------------------------ > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------