I strive to meet that stated compatibility goal when I release Hadoop.
But we don't have a rigorous compatibility/upgrade test in Hadoop so YMMV
(we now have in Ozone!)

There are so many gotchas that it really depends on the RM to do the
hardwork, checking protobuf definitions, running API compat report,
compiling against downstream applications.
The other thing is thirdparty dependency update. Whenever I bump Netty or
Jetty version, new transitive dependencies slip in as part of the update,
which sometimes break HBase because of the dependency check in shading.

On Mon, Sep 16, 2024 at 4:48 AM Istvan Toth <st...@cloudera.com.invalid>
wrote:

> On Wed, Sep 11, 2024 at 4:30 PM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
>
> > There is a problem that, usually, you can use an old hadoop client to
> > communicate with a new hadoop server, but not vice versa.
> >
>
> Do we have examples of that ?
>
>
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html
> specifically states otherwise:
>
> In addition to the limitations imposed by being Stable
> <
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Stable
> >,
> Hadoop’s wire protocols MUST also be forward compatible across minor
> releases within a major version according to the following:
>
>    - Client-Server compatibility MUST be maintained so as to allow users to
>    continue using older clients even after upgrading the server (cluster)
> to a
>    later version (or vice versa). For example, a Hadoop 2.1.0 client
> talking
>    to a Hadoop 2.3.0 cluster.
>    - Client-Server compatibility MUST be maintained so as to allow users to
>    upgrade the client before upgrading the server (cluster). For example, a
>    Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows
>    deployment of client-side bug fixes ahead of full cluster upgrades. Note
>    that new cluster features invoked by new client APIs or shell commands
> will
>    not be usable. YARN applications that attempt to use new APIs (including
>    new fields in data structures) that have not yet been deployed to the
>    cluster can expect link exceptions.
>    - Client-Server compatibility MUST be maintained so as to allow
>    upgrading individual components without upgrading others. For example,
>    upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
>    - Server-Server compatibility MUST be maintained so as to allow mixed
>    versions within an active cluster so the cluster may be upgraded without
>    downtime in a rolling fashion.
>
> Admittedly, I don't have a lot of experience with mismatched Hadoop
> versions, but my proposal should be covered by the second clause.
>
> Usage of newer APIs should be caught when compiling with older Hadoop
> versions.
> The only risk I can see is when we use a new feature which was added
> without changing the API signature (such as adding a new constant value for
> some new behaviour)
>
>
> > When deploying HBase, HBase itself acts as a client of hadoop, that's
> > why we always stay on the oldest support hadoop version.
> >
> >
> Not true for 2.6 , which according to the docs supports Hadoop 3.2, but
> defaults to Hadoop 3.3
>
>
> > For me, technically I think bumping to the newest patch release of a
> > minor release should be fine, which is the proposal 1.
> >
> > But the current hadoopcheck is not enough, since it can only ensure
> > that there is no complation error.
> > Maybe we should also run some simple dev tests in the hadoopcheck
> > stage, and in integration tests, we should try to build with all the
> > support hadoop version and run the basic read write tests.
>
>
> Do we need to test all versions ?
> If We test with say, 3.3.0 and 3.3.6 , do we need to test with 3.3.[1-5] ?
> Or if we test with 3.2.5  and 3.3.6, do we need to test with any of the
> interim versions ?
>
> Basically, how much do we trust Hadoop to keep to its compatibility rules ?
>
> Running a limited number of tests should not be a problem.
> Should we add a new test category, so that they can be easily started from
> Maven ?
>
> Can you suggest some tests that we should run for the compatibility check ?
>
>
> > Thanks.
> >
> > Istvan Toth <st...@cloudera.com.invalid> 于2024年9月11日周三 21:05写道:
> > >
> > > Let me summarize my take of the discussion so far:
> > >
> > > There are two aspects to the HBase version we build with:
> > > 1. Source code quality/compatibility
> > > 2. Security and usability of the public binary assemblies and (shaded)
> > > hbase maven artifacts.
> > >
> > > 1. Source code quality/compatibility
> > >
> > > AFAICT we have the following hard goals:
> > > 1.a : Ensure that HBase compiles and runs well with the earlier
> supported
> > > Hadoop version on the given branch
> > > 1.b: Ensure that HBase compiles and runs well with the latest supported
> > > Hadoop version on the given branch
> > >
> > > In my opinion we should also strive for these goals:
> > > 1.c: Aim to officially support the newest possible Hadoop releases
> > > 1.d: Take advantage  of new features in newer Hadoop versions
> > >
> > > 2. Public binary usability wish list:
> > >
> > > 2.a: We want them to work OOB for as many use cases as possible
> > > 2.b: We want to work them as well as possible
> > > 2.c: We want to have as few CVEs in them as possible
> > > 2.d: We want to make upgrades as painless as possible, especially for
> > patch
> > > releases
> > >
> > > The factor that Hadoop does not have an explicit end-of-life policy of
> > > course complicates things.
> > >
> > > Our current policy seems to be that we pick a Hadoop version to build
> > with
> > > when releasing a minor version,
> > > and stay on that version until there is a newer patch released of that
> > > minor version with direct CVE fixes.
> > > This does not seem to be an absolute, for example the recently released
> > > HBase 2.4.18 still defaults to Hadoop 3.1.2,
> > > which has several old CVEs, many of which are reportedly fixed in 3.1.3
> > and
> > > 3.1.4.
> > >
> > > my proposals are :
> > >
> > > Proposal 1:
> > >
> > > Whenever a new Hadoop patch release is released for a minor version,
> then
> > > unless it breaks source compatibility, we should automatically update
> the
> > > default Hadoop version for
> > > all branches that use the same minor version.
> > > The existing hadoopcheck mechanism should be good enough to guarantee
> > that
> > > we do not break compatibility with the earlier patch releases.
> > >
> > > This would ensure that the binaries use the latest and greatest Hadoop
> > (of
> > > that minor branch) and that users of the binaries get the latest fixes,
> > > both CVE and functionality wise, and
> > > the binaries also get the transitive CVE fixes in that release.
> > > For example,if we did this we could use  the new feature in 3.3.6 in
> > > HBASE-27769 (via reflection) and also test it, thereby improving Ozone
> > > support.
> > >
> > > On the other hand we minimize changes and maximize compatibility by
> > > sticking to the same Hadoop minor release.
> > >
> > > Proposal 2:
> > >
> > > We should default to the latest hadoop version (currently 3.4.0) on
> > > unreleased branches.
> > > This should ensure that when we do release we default to the latest
> > > version, and we've tested it as thoroughly as possible.
> > >
> > > Again. the existing Hadoopcheck mechanism should ensure that we do not
> > > break compatibility with earlier supported versions.
> > >
> > > Istvan
> > >
> > >
> > >
> > >
> > > On Mon, Sep 9, 2024 at 9:41 PM Nick Dimiduk <ndimi...@apache.org>
> wrote:
> > >
> > > > Yes, we’ll use reflection to make use of APIs introduced in newer
> HDFS
> > > > versions than the stated dependency until the stated dependency
> finally
> > > > catches up.
> > > >
> > > > On Mon, 9 Sep 2024 at 19:55, Wei-Chiu Chuang <weic...@apache.org>
> > wrote:
> > > >
> > > > > Reflection is probably the way to go to ensure maximum
> compatibility
> > TBH
> > > > >
> > > > > On Mon, Sep 9, 2024 at 10:40 AM Istvan Toth
> > <st...@cloudera.com.invalid>
> > > > > wrote:
> > > > >
> > > > > > Stephen Wu has kindly sent me the link for the previous email
> > thread:
> > > > > > https://lists.apache.org/thread/2k4tvz3wpg06sgkynkhgvxrodmj86vsj
> > > > > >
> > > > > > Reading it, I cannot see anything there that would contraindicate
> > > > > upgrading
> > > > > > to 3.3.6 from 3.3.5, at least on the branches that already
> default
> > to
> > > > > > 3.3.5, i.e. 2.6+.
> > > > > >
> > > > > > At first glance, the new logic in HBASE-27769 could also be
> > implemented
> > > > > > with the usual reflection hacks, while preserving the old logic
> for
> > > > > Hadoop
> > > > > > 3.3.5 and earlier.
> > > > > >
> > > > > > Thanks,
> > > > > > Istvan
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 9, 2024 at 1:42 PM Istvan Toth <st...@cloudera.com>
> > wrote:
> > > > > >
> > > > > > > Thanks for your reply, Nick.
> > > > > > >
> > > > > > > There are no listed direct CVEs in either Hadoop 3.2.4 or
> 3.3.5,
> > but
> > > > > > there
> > > > > > > are CVEs in their transitive dependencies.
> > > > > > >
> > > > > > > My impression is that rather than shipping the oldest 'safe'
> > version,
> > > > > > > HBase does seem to update the default Hadoop version to the
> > > > latest-ish
> > > > > at
> > > > > > > the time of the start
> > > > > > > of the release process, otherwise 2.6 would still default to
> > 3.2.4.
> > > > > > (HBase
> > > > > > > 2.6 release was already underway when Hadoop 3.4.0 was
> released)
> > > > > > >
> > > > > > > For now, we (Phoenix) have resorted to dependency managing
> > transitive
> > > > > > > dependencies coming in (only) via Hadoop in Phoenix,
> > > > > > > but that is a slippery slope, and adds a layer of uncertainty,
> > as it
> > > > > may
> > > > > > > introduce incompatibilities in Hadoop that we don't have tests
> > for.
> > > > > > >
> > > > > > > Our situation is similar to that of the HBase shaded artifacts,
> > where
> > > > > we
> > > > > > > ship a huge uberjar that includes much of both HBase and Hadoop
> > on
> > > > top
> > > > > of
> > > > > > > (or rather below) Phoenix,
> > > > > > > similar to the hbase-client-shaded jar.
> > > > > > >
> > > > > > > I will look into to hadoop check CI tests that you've
> mentioned,
> > > > then I
> > > > > > > will try to resurrect HBASE-27931, and if I don't find any
> > issues,
> > > > and
> > > > > > > there are no objections, then
> > > > > > > I will put a PR to update the unreleased version to default to
> > 3.4.0.
> > > > > > >
> > > > > > > Istvan
> > > > > > >
> > > > > > > On Mon, Sep 9, 2024 at 11:06 AM Nick Dimiduk <
> > ndimi...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > >> My understanding of our hadoop dependency policy is that we
> ship
> > > > poms
> > > > > > with
> > > > > > >> hadoop versions pinned to the oldest compatible, "safe"
> version
> > that
> > > > > is
> > > > > > >> supported. Our test infrastructure has a "hadoop check"
> > procedure
> > > > that
> > > > > > >> does
> > > > > > >> some validation against other patch release versions.
> > > > > > >>
> > > > > > >> I don't know if anyone has done a CVE sweep recently. If there
> > are
> > > > new
> > > > > > >> CVEs, we do bump the minimum supported version specified in
> the
> > pom
> > > > as
> > > > > > >> part
> > > > > > >> of patch releases. These changes need to include a pretty
> > thorough
> > > > > > >> compatibility check so that we can include release notes about
> > any
> > > > > > >> introduced incompatibilities.
> > > > > > >>
> > > > > > >> I am in favor of a dependency bump so as to address known CVEs
> > as
> > > > best
> > > > > > as
> > > > > > >> we reasonably can.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Nick
> > > > > > >>
> > > > > > >> On Mon, Sep 9, 2024 at 10:59 AM Istvan Toth <st...@apache.org
> >
> > > > wrote:
> > > > > > >>
> > > > > > >> > Hi!
> > > > > > >> >
> > > > > > >> > I'm working on building the Phoenix uberjars with newer
> Hadoop
> > > > > > versions
> > > > > > >> by
> > > > > > >> > default to improve its CVE stance, and I realized that HBase
> > > > itself
> > > > > > does
> > > > > > >> > not use the latest releases.
> > > > > > >> >
> > > > > > >> > branch-2.5 defaults to 3.2.4
> > > > > > >> > branch-2.6 and later defaults to 3.3.5
> > > > > > >> >
> > > > > > >> > I can kind of understand that we don't want to bump the
> minor
> > > > > version
> > > > > > >> for
> > > > > > >> > branch-2.5 from the one it was released with.
> > > > > > >> >
> > > > > > >> > However, I don't see the rationale for not upgrading
> > branch-2.6 to
> > > > > at
> > > > > > >> least
> > > > > > >> > 3.3.6, and the unreleased branches (branch-2, branch-3,
> > master) to
> > > > > > >> 3.4.0.
> > > > > > >> >
> > > > > > >> > I found a mention of wanting to stay off the latest patch
> > release
> > > > > > >> > HBASE-27931, but I could not figure if it has a technical
> > reason,
> > > > or
> > > > > > if
> > > > > > >> > this is a written (or unwritten) policy.
> > > > > > >> >
> > > > > > >> > best regards
> > > > > > >> > Istvan
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > > > *Email*: st...@cloudera.com
> > > > > > > cloudera.com <https://www.cloudera.com>
> > > > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> > [image:
> > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
> > [image:
> > > > > > > Cloudera on LinkedIn] <
> https://www.linkedin.com/company/cloudera
> > >
> > > > > > > ------------------------------
> > > > > > > ------------------------------
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > > *Email*: st...@cloudera.com
> > > > > > cloudera.com <https://www.cloudera.com>
> > > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> > [image:
> > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
> [image:
> > > > > Cloudera
> > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > > ------------------------------
> > > > > > ------------------------------
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > *István Tóth* | Sr. Staff Software Engineer
> > > *Email*: st...@cloudera.com
> > > cloudera.com <https://www.cloudera.com>
> > > [image: Cloudera] <https://www.cloudera.com/>
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > > ------------------------------
> >
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: st...@cloudera.com
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
> ------------------------------
>

Reply via email to