I don't think you'll need to manage downloading anything. Just run something like `mvn clean test -Dhadoop.version=... -PrunDevTests=true` for each hadoop version we need to check. The actual command will be a little more complex than that thanks to our flaky tests exclusions. It would probably be good if this was integrated into the yetus personality (dev-support/hbase-personality.sh).
This is going to add a lot of time to our nightlies for branch-2... On Mon, Sep 16, 2024 at 3:45 PM Istvan Toth <st...@cloudera.com.invalid> wrote: > OK, I'm gonna look into downloading multiple Hadoop 3 versions, and running > those tests with each one. > > > > > > On Mon, Sep 16, 2024 at 3:08 PM 张铎(Duo Zhang) <palomino...@gmail.com> > wrote: > > > And if we can make sure the compatibility, I agree that we could > > depend on the newest possible hadoop version by default. As you said, > > it can reduce most transitive security issues. > > > > There are still 3 security issues on master branch because of netty 3, > > which should be fixed in 3.4.0. > > > > 张铎(Duo Zhang) <palomino...@gmail.com> 于2024年9月16日周一 21:03写道: > > > > > > There is a devTests profile in our pom, we can make use of it first. > > > > > > And on integration tests, I mean this one > > > > > > > > > https://github.com/apache/hbase/blob/4446d297112899dab59c0952489457c4419366d3/dev-support/Jenkinsfile#L755 > > > > > > We could extend this test to test different combinations. > > > > > > Istvan Toth <st...@cloudera.com.invalid> 于2024年9月16日周一 19:48写道: > > > > > > > > On Wed, Sep 11, 2024 at 4:30 PM 张铎(Duo Zhang) <palomino...@gmail.com > > > > wrote: > > > > > > > > > There is a problem that, usually, you can use an old hadoop client > to > > > > > communicate with a new hadoop server, but not vice versa. > > > > > > > > > > > > > Do we have examples of that ? > > > > > > > > > > > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html > > > > specifically states otherwise: > > > > > > > > In addition to the limitations imposed by being Stable > > > > < > > > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Stable > > >, > > > > Hadoop’s wire protocols MUST also be forward compatible across minor > > > > releases within a major version according to the following: > > > > > > > > - Client-Server compatibility MUST be maintained so as to allow > > users to > > > > continue using older clients even after upgrading the server > > (cluster) to a > > > > later version (or vice versa). For example, a Hadoop 2.1.0 client > > talking > > > > to a Hadoop 2.3.0 cluster. > > > > - Client-Server compatibility MUST be maintained so as to allow > > users to > > > > upgrade the client before upgrading the server (cluster). For > > example, a > > > > Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows > > > > deployment of client-side bug fixes ahead of full cluster > upgrades. > > Note > > > > that new cluster features invoked by new client APIs or shell > > commands will > > > > not be usable. YARN applications that attempt to use new APIs > > (including > > > > new fields in data structures) that have not yet been deployed to > > the > > > > cluster can expect link exceptions. > > > > - Client-Server compatibility MUST be maintained so as to allow > > > > upgrading individual components without upgrading others. For > > example, > > > > upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading > > MapReduce. > > > > - Server-Server compatibility MUST be maintained so as to allow > > mixed > > > > versions within an active cluster so the cluster may be upgraded > > without > > > > downtime in a rolling fashion. > > > > > > > > Admittedly, I don't have a lot of experience with mismatched Hadoop > > > > versions, but my proposal should be covered by the second clause. > > > > > > > > Usage of newer APIs should be caught when compiling with older Hadoop > > > > versions. > > > > The only risk I can see is when we use a new feature which was added > > > > without changing the API signature (such as adding a new constant > > value for > > > > some new behaviour) > > > > > > > > > > > > > When deploying HBase, HBase itself acts as a client of hadoop, > that's > > > > > why we always stay on the oldest support hadoop version. > > > > > > > > > > > > > > Not true for 2.6 , which according to the docs supports Hadoop 3.2, > but > > > > defaults to Hadoop 3.3 > > > > > > > > > > > > > For me, technically I think bumping to the newest patch release of > a > > > > > minor release should be fine, which is the proposal 1. > > > > > > > > > > But the current hadoopcheck is not enough, since it can only ensure > > > > > that there is no complation error. > > > > > Maybe we should also run some simple dev tests in the hadoopcheck > > > > > stage, and in integration tests, we should try to build with all > the > > > > > support hadoop version and run the basic read write tests. > > > > > > > > > > > > Do we need to test all versions ? > > > > If We test with say, 3.3.0 and 3.3.6 , do we need to test with > > 3.3.[1-5] ? > > > > Or if we test with 3.2.5 and 3.3.6, do we need to test with any of > the > > > > interim versions ? > > > > > > > > Basically, how much do we trust Hadoop to keep to its compatibility > > rules ? > > > > > > > > Running a limited number of tests should not be a problem. > > > > Should we add a new test category, so that they can be easily started > > from > > > > Maven ? > > > > > > > > Can you suggest some tests that we should run for the compatibility > > check ? > > > > > > > > > > > > > Thanks. > > > > > > > > > > Istvan Toth <st...@cloudera.com.invalid> 于2024年9月11日周三 21:05写道: > > > > > > > > > > > > Let me summarize my take of the discussion so far: > > > > > > > > > > > > There are two aspects to the HBase version we build with: > > > > > > 1. Source code quality/compatibility > > > > > > 2. Security and usability of the public binary assemblies and > > (shaded) > > > > > > hbase maven artifacts. > > > > > > > > > > > > 1. Source code quality/compatibility > > > > > > > > > > > > AFAICT we have the following hard goals: > > > > > > 1.a : Ensure that HBase compiles and runs well with the earlier > > supported > > > > > > Hadoop version on the given branch > > > > > > 1.b: Ensure that HBase compiles and runs well with the latest > > supported > > > > > > Hadoop version on the given branch > > > > > > > > > > > > In my opinion we should also strive for these goals: > > > > > > 1.c: Aim to officially support the newest possible Hadoop > releases > > > > > > 1.d: Take advantage of new features in newer Hadoop versions > > > > > > > > > > > > 2. Public binary usability wish list: > > > > > > > > > > > > 2.a: We want them to work OOB for as many use cases as possible > > > > > > 2.b: We want to work them as well as possible > > > > > > 2.c: We want to have as few CVEs in them as possible > > > > > > 2.d: We want to make upgrades as painless as possible, especially > > for > > > > > patch > > > > > > releases > > > > > > > > > > > > The factor that Hadoop does not have an explicit end-of-life > > policy of > > > > > > course complicates things. > > > > > > > > > > > > Our current policy seems to be that we pick a Hadoop version to > > build > > > > > with > > > > > > when releasing a minor version, > > > > > > and stay on that version until there is a newer patch released of > > that > > > > > > minor version with direct CVE fixes. > > > > > > This does not seem to be an absolute, for example the recently > > released > > > > > > HBase 2.4.18 still defaults to Hadoop 3.1.2, > > > > > > which has several old CVEs, many of which are reportedly fixed in > > 3.1.3 > > > > > and > > > > > > 3.1.4. > > > > > > > > > > > > my proposals are : > > > > > > > > > > > > Proposal 1: > > > > > > > > > > > > Whenever a new Hadoop patch release is released for a minor > > version, then > > > > > > unless it breaks source compatibility, we should automatically > > update the > > > > > > default Hadoop version for > > > > > > all branches that use the same minor version. > > > > > > The existing hadoopcheck mechanism should be good enough to > > guarantee > > > > > that > > > > > > we do not break compatibility with the earlier patch releases. > > > > > > > > > > > > This would ensure that the binaries use the latest and greatest > > Hadoop > > > > > (of > > > > > > that minor branch) and that users of the binaries get the latest > > fixes, > > > > > > both CVE and functionality wise, and > > > > > > the binaries also get the transitive CVE fixes in that release. > > > > > > For example,if we did this we could use the new feature in 3.3.6 > > in > > > > > > HBASE-27769 (via reflection) and also test it, thereby improving > > Ozone > > > > > > support. > > > > > > > > > > > > On the other hand we minimize changes and maximize compatibility > by > > > > > > sticking to the same Hadoop minor release. > > > > > > > > > > > > Proposal 2: > > > > > > > > > > > > We should default to the latest hadoop version (currently 3.4.0) > on > > > > > > unreleased branches. > > > > > > This should ensure that when we do release we default to the > latest > > > > > > version, and we've tested it as thoroughly as possible. > > > > > > > > > > > > Again. the existing Hadoopcheck mechanism should ensure that we > do > > not > > > > > > break compatibility with earlier supported versions. > > > > > > > > > > > > Istvan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 9, 2024 at 9:41 PM Nick Dimiduk <ndimi...@apache.org > > > > wrote: > > > > > > > > > > > > > Yes, we’ll use reflection to make use of APIs introduced in > > newer HDFS > > > > > > > versions than the stated dependency until the stated dependency > > finally > > > > > > > catches up. > > > > > > > > > > > > > > On Mon, 9 Sep 2024 at 19:55, Wei-Chiu Chuang < > weic...@apache.org > > > > > > > > wrote: > > > > > > > > > > > > > > > Reflection is probably the way to go to ensure maximum > > compatibility > > > > > TBH > > > > > > > > > > > > > > > > On Mon, Sep 9, 2024 at 10:40 AM Istvan Toth > > > > > <st...@cloudera.com.invalid> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Stephen Wu has kindly sent me the link for the previous > email > > > > > thread: > > > > > > > > > > > https://lists.apache.org/thread/2k4tvz3wpg06sgkynkhgvxrodmj86vsj > > > > > > > > > > > > > > > > > > Reading it, I cannot see anything there that would > > contraindicate > > > > > > > > upgrading > > > > > > > > > to 3.3.6 from 3.3.5, at least on the branches that already > > default > > > > > to > > > > > > > > > 3.3.5, i.e. 2.6+. > > > > > > > > > > > > > > > > > > At first glance, the new logic in HBASE-27769 could also be > > > > > implemented > > > > > > > > > with the usual reflection hacks, while preserving the old > > logic for > > > > > > > > Hadoop > > > > > > > > > 3.3.5 and earlier. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Istvan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 9, 2024 at 1:42 PM Istvan Toth < > > st...@cloudera.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for your reply, Nick. > > > > > > > > > > > > > > > > > > > > There are no listed direct CVEs in either Hadoop 3.2.4 or > > 3.3.5, > > > > > but > > > > > > > > > there > > > > > > > > > > are CVEs in their transitive dependencies. > > > > > > > > > > > > > > > > > > > > My impression is that rather than shipping the oldest > > 'safe' > > > > > version, > > > > > > > > > > HBase does seem to update the default Hadoop version to > the > > > > > > > latest-ish > > > > > > > > at > > > > > > > > > > the time of the start > > > > > > > > > > of the release process, otherwise 2.6 would still default > > to > > > > > 3.2.4. > > > > > > > > > (HBase > > > > > > > > > > 2.6 release was already underway when Hadoop 3.4.0 was > > released) > > > > > > > > > > > > > > > > > > > > For now, we (Phoenix) have resorted to dependency > managing > > > > > transitive > > > > > > > > > > dependencies coming in (only) via Hadoop in Phoenix, > > > > > > > > > > but that is a slippery slope, and adds a layer of > > uncertainty, > > > > > as it > > > > > > > > may > > > > > > > > > > introduce incompatibilities in Hadoop that we don't have > > tests > > > > > for. > > > > > > > > > > > > > > > > > > > > Our situation is similar to that of the HBase shaded > > artifacts, > > > > > where > > > > > > > > we > > > > > > > > > > ship a huge uberjar that includes much of both HBase and > > Hadoop > > > > > on > > > > > > > top > > > > > > > > of > > > > > > > > > > (or rather below) Phoenix, > > > > > > > > > > similar to the hbase-client-shaded jar. > > > > > > > > > > > > > > > > > > > > I will look into to hadoop check CI tests that you've > > mentioned, > > > > > > > then I > > > > > > > > > > will try to resurrect HBASE-27931, and if I don't find > any > > > > > issues, > > > > > > > and > > > > > > > > > > there are no objections, then > > > > > > > > > > I will put a PR to update the unreleased version to > > default to > > > > > 3.4.0. > > > > > > > > > > > > > > > > > > > > Istvan > > > > > > > > > > > > > > > > > > > > On Mon, Sep 9, 2024 at 11:06 AM Nick Dimiduk < > > > > > ndimi...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> My understanding of our hadoop dependency policy is that > > we ship > > > > > > > poms > > > > > > > > > with > > > > > > > > > >> hadoop versions pinned to the oldest compatible, "safe" > > version > > > > > that > > > > > > > > is > > > > > > > > > >> supported. Our test infrastructure has a "hadoop check" > > > > > procedure > > > > > > > that > > > > > > > > > >> does > > > > > > > > > >> some validation against other patch release versions. > > > > > > > > > >> > > > > > > > > > >> I don't know if anyone has done a CVE sweep recently. If > > there > > > > > are > > > > > > > new > > > > > > > > > >> CVEs, we do bump the minimum supported version specified > > in the > > > > > pom > > > > > > > as > > > > > > > > > >> part > > > > > > > > > >> of patch releases. These changes need to include a > pretty > > > > > thorough > > > > > > > > > >> compatibility check so that we can include release notes > > about > > > > > any > > > > > > > > > >> introduced incompatibilities. > > > > > > > > > >> > > > > > > > > > >> I am in favor of a dependency bump so as to address > known > > CVEs > > > > > as > > > > > > > best > > > > > > > > > as > > > > > > > > > >> we reasonably can. > > > > > > > > > >> > > > > > > > > > >> Thanks, > > > > > > > > > >> Nick > > > > > > > > > >> > > > > > > > > > >> On Mon, Sep 9, 2024 at 10:59 AM Istvan Toth < > > st...@apache.org> > > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > >> > Hi! > > > > > > > > > >> > > > > > > > > > > >> > I'm working on building the Phoenix uberjars with > newer > > Hadoop > > > > > > > > > versions > > > > > > > > > >> by > > > > > > > > > >> > default to improve its CVE stance, and I realized that > > HBase > > > > > > > itself > > > > > > > > > does > > > > > > > > > >> > not use the latest releases. > > > > > > > > > >> > > > > > > > > > > >> > branch-2.5 defaults to 3.2.4 > > > > > > > > > >> > branch-2.6 and later defaults to 3.3.5 > > > > > > > > > >> > > > > > > > > > > >> > I can kind of understand that we don't want to bump > the > > minor > > > > > > > > version > > > > > > > > > >> for > > > > > > > > > >> > branch-2.5 from the one it was released with. > > > > > > > > > >> > > > > > > > > > > >> > However, I don't see the rationale for not upgrading > > > > > branch-2.6 to > > > > > > > > at > > > > > > > > > >> least > > > > > > > > > >> > 3.3.6, and the unreleased branches (branch-2, > branch-3, > > > > > master) to > > > > > > > > > >> 3.4.0. > > > > > > > > > >> > > > > > > > > > > >> > I found a mention of wanting to stay off the latest > > patch > > > > > release > > > > > > > > > >> > HBASE-27931, but I could not figure if it has a > > technical > > > > > reason, > > > > > > > or > > > > > > > > > if > > > > > > > > > >> > this is a written (or unwritten) policy. > > > > > > > > > >> > > > > > > > > > > >> > best regards > > > > > > > > > >> > Istvan > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > *István Tóth* | Sr. Staff Software Engineer > > > > > > > > > > *Email*: st...@cloudera.com > > > > > > > > > > cloudera.com <https://www.cloudera.com> > > > > > > > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > > > > > > > [image: Cloudera on Twitter] < > https://twitter.com/cloudera > > > > > > > > [image: > > > > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera > > > > > > > [image: > > > > > > > > > > Cloudera on LinkedIn] < > > https://www.linkedin.com/company/cloudera > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > > > ------------------------------ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > *István Tóth* | Sr. Staff Software Engineer > > > > > > > > > *Email*: st...@cloudera.com > > > > > > > > > cloudera.com <https://www.cloudera.com> > > > > > > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera > > > > > > > [image: > > > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> > > [image: > > > > > > > > Cloudera > > > > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > > > > > > > ------------------------------ > > > > > > > > > ------------------------------ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > *István Tóth* | Sr. Staff Software Engineer > > > > > > *Email*: st...@cloudera.com > > > > > > cloudera.com <https://www.cloudera.com> > > > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> > > [image: > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> > [image: > > > > > Cloudera > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > > > > ------------------------------ > > > > > > ------------------------------ > > > > > > > > > > > > > > > > > -- > > > > *István Tóth* | Sr. Staff Software Engineer > > > > *Email*: st...@cloudera.com > > > > cloudera.com <https://www.cloudera.com> > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > Cloudera > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > > ------------------------------ > > > > ------------------------------ > > > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: st...@cloudera.com > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ >