Let me summarize my take of the discussion so far:

There are two aspects to the HBase version we build with:
1. Source code quality/compatibility
2. Security and usability of the public binary assemblies and (shaded)
hbase maven artifacts.

1. Source code quality/compatibility

AFAICT we have the following hard goals:
1.a : Ensure that HBase compiles and runs well with the earlier supported
Hadoop version on the given branch
1.b: Ensure that HBase compiles and runs well with the latest supported
Hadoop version on the given branch

In my opinion we should also strive for these goals:
1.c: Aim to officially support the newest possible Hadoop releases
1.d: Take advantage  of new features in newer Hadoop versions

2. Public binary usability wish list:

2.a: We want them to work OOB for as many use cases as possible
2.b: We want to work them as well as possible
2.c: We want to have as few CVEs in them as possible
2.d: We want to make upgrades as painless as possible, especially for patch
releases

The factor that Hadoop does not have an explicit end-of-life policy of
course complicates things.

Our current policy seems to be that we pick a Hadoop version to build with
when releasing a minor version,
and stay on that version until there is a newer patch released of that
minor version with direct CVE fixes.
This does not seem to be an absolute, for example the recently released
HBase 2.4.18 still defaults to Hadoop 3.1.2,
which has several old CVEs, many of which are reportedly fixed in 3.1.3 and
3.1.4.

my proposals are :

Proposal 1:

Whenever a new Hadoop patch release is released for a minor version, then
unless it breaks source compatibility, we should automatically update the
default Hadoop version for
all branches that use the same minor version.
The existing hadoopcheck mechanism should be good enough to guarantee that
we do not break compatibility with the earlier patch releases.

This would ensure that the binaries use the latest and greatest Hadoop (of
that minor branch) and that users of the binaries get the latest fixes,
both CVE and functionality wise, and
the binaries also get the transitive CVE fixes in that release.
For example,if we did this we could use  the new feature in 3.3.6 in
HBASE-27769 (via reflection) and also test it, thereby improving Ozone
support.

On the other hand we minimize changes and maximize compatibility by
sticking to the same Hadoop minor release.

Proposal 2:

We should default to the latest hadoop version (currently 3.4.0) on
unreleased branches.
This should ensure that when we do release we default to the latest
version, and we've tested it as thoroughly as possible.

Again. the existing Hadoopcheck mechanism should ensure that we do not
break compatibility with earlier supported versions.

Istvan




On Mon, Sep 9, 2024 at 9:41 PM Nick Dimiduk <ndimi...@apache.org> wrote:

> Yes, we’ll use reflection to make use of APIs introduced in newer HDFS
> versions than the stated dependency until the stated dependency finally
> catches up.
>
> On Mon, 9 Sep 2024 at 19:55, Wei-Chiu Chuang <weic...@apache.org> wrote:
>
> > Reflection is probably the way to go to ensure maximum compatibility TBH
> >
> > On Mon, Sep 9, 2024 at 10:40 AM Istvan Toth <st...@cloudera.com.invalid>
> > wrote:
> >
> > > Stephen Wu has kindly sent me the link for the previous email thread:
> > > https://lists.apache.org/thread/2k4tvz3wpg06sgkynkhgvxrodmj86vsj
> > >
> > > Reading it, I cannot see anything there that would contraindicate
> > upgrading
> > > to 3.3.6 from 3.3.5, at least on the branches that already default to
> > > 3.3.5, i.e. 2.6+.
> > >
> > > At first glance, the new logic in HBASE-27769 could also be implemented
> > > with the usual reflection hacks, while preserving the old logic for
> > Hadoop
> > > 3.3.5 and earlier.
> > >
> > > Thanks,
> > > Istvan
> > >
> > >
> > >
> > > On Mon, Sep 9, 2024 at 1:42 PM Istvan Toth <st...@cloudera.com> wrote:
> > >
> > > > Thanks for your reply, Nick.
> > > >
> > > > There are no listed direct CVEs in either Hadoop 3.2.4 or 3.3.5, but
> > > there
> > > > are CVEs in their transitive dependencies.
> > > >
> > > > My impression is that rather than shipping the oldest 'safe' version,
> > > > HBase does seem to update the default Hadoop version to the
> latest-ish
> > at
> > > > the time of the start
> > > > of the release process, otherwise 2.6 would still default to 3.2.4.
> > > (HBase
> > > > 2.6 release was already underway when Hadoop 3.4.0 was released)
> > > >
> > > > For now, we (Phoenix) have resorted to dependency managing transitive
> > > > dependencies coming in (only) via Hadoop in Phoenix,
> > > > but that is a slippery slope, and adds a layer of uncertainty, as it
> > may
> > > > introduce incompatibilities in Hadoop that we don't have tests for.
> > > >
> > > > Our situation is similar to that of the HBase shaded artifacts, where
> > we
> > > > ship a huge uberjar that includes much of both HBase and Hadoop on
> top
> > of
> > > > (or rather below) Phoenix,
> > > > similar to the hbase-client-shaded jar.
> > > >
> > > > I will look into to hadoop check CI tests that you've mentioned,
> then I
> > > > will try to resurrect HBASE-27931, and if I don't find any issues,
> and
> > > > there are no objections, then
> > > > I will put a PR to update the unreleased version to default to 3.4.0.
> > > >
> > > > Istvan
> > > >
> > > > On Mon, Sep 9, 2024 at 11:06 AM Nick Dimiduk <ndimi...@apache.org>
> > > wrote:
> > > >
> > > >> My understanding of our hadoop dependency policy is that we ship
> poms
> > > with
> > > >> hadoop versions pinned to the oldest compatible, "safe" version that
> > is
> > > >> supported. Our test infrastructure has a "hadoop check" procedure
> that
> > > >> does
> > > >> some validation against other patch release versions.
> > > >>
> > > >> I don't know if anyone has done a CVE sweep recently. If there are
> new
> > > >> CVEs, we do bump the minimum supported version specified in the pom
> as
> > > >> part
> > > >> of patch releases. These changes need to include a pretty thorough
> > > >> compatibility check so that we can include release notes about any
> > > >> introduced incompatibilities.
> > > >>
> > > >> I am in favor of a dependency bump so as to address known CVEs as
> best
> > > as
> > > >> we reasonably can.
> > > >>
> > > >> Thanks,
> > > >> Nick
> > > >>
> > > >> On Mon, Sep 9, 2024 at 10:59 AM Istvan Toth <st...@apache.org>
> wrote:
> > > >>
> > > >> > Hi!
> > > >> >
> > > >> > I'm working on building the Phoenix uberjars with newer Hadoop
> > > versions
> > > >> by
> > > >> > default to improve its CVE stance, and I realized that HBase
> itself
> > > does
> > > >> > not use the latest releases.
> > > >> >
> > > >> > branch-2.5 defaults to 3.2.4
> > > >> > branch-2.6 and later defaults to 3.3.5
> > > >> >
> > > >> > I can kind of understand that we don't want to bump the minor
> > version
> > > >> for
> > > >> > branch-2.5 from the one it was released with.
> > > >> >
> > > >> > However, I don't see the rationale for not upgrading branch-2.6 to
> > at
> > > >> least
> > > >> > 3.3.6, and the unreleased branches (branch-2, branch-3, master) to
> > > >> 3.4.0.
> > > >> >
> > > >> > I found a mention of wanting to stay off the latest patch release
> > > >> > HBASE-27931, but I could not figure if it has a technical reason,
> or
> > > if
> > > >> > this is a written (or unwritten) policy.
> > > >> >
> > > >> > best regards
> > > >> > Istvan
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > *István Tóth* | Sr. Staff Software Engineer
> > > > *Email*: st...@cloudera.com
> > > > cloudera.com <https://www.cloudera.com>
> > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > ------------------------------
> > > > ------------------------------
> > > >
> > >
> > >
> > > --
> > > *István Tóth* | Sr. Staff Software Engineer
> > > *Email*: st...@cloudera.com
> > > cloudera.com <https://www.cloudera.com>
> > > [image: Cloudera] <https://www.cloudera.com/>
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > > ------------------------------
> > >
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to