Should we also make hadoop3 the default active profile for branch-2 going
forward?

On Fri, Aug 26, 2022 at 5:25 PM Andrew Purtell <[email protected]>
wrote:

> The security posture of Hadoop 2 in general is a problem, because
> maintenance on that branch is spotty, that is just how it goes. We had the
> same situation with our now EOL branch-1. I know Hadoop released 2.10.2 to
> address some CVE worthy problems but it is unclear if 2.10.2 addresses all
> known issues, unlike 3.3.4. Also as you know Hadoop 2 has unpatchable
> dependencies on org.codehaus versions of Jackson and Jetty, which
> themselves have high scoring CVEs that will never be fixed because they are
> EOL, and other similar issues. Hadoop 3 doesn’t completely solve such
> problems but is the only realistic place we can hope they can be addressed
> as required. For organizations that implement or require a top to bottom
> security audit of their software bill of materials, it seems possible to
> avoid user pain by providing supported convenience artifacts *and*
> libraries built against Hadoop 3 APIs in the Apache repository addressable
> with a Maven classifier.
>
> My employer has some interests in this area that align so I would like to
> sponsor (implement, review, commit, RM backfill releases, etc.) this work.
> Would there be any objections? Read through the thread for some thoughts on
> approach. Summarized:
>
> - Amend create-release to build, stage, and deploy a -hadoop3 variant
> build by activating the Hadoop 3 build profile.
>
> - Amend the Hadoop 3 build profile to flatten POMs before deployment to
> resolve potential downstream issues due to Hadoop 3 being a non-default
> build profile. (This could also be applied to all builds.)
>
> - Amend hbase-vote to be aware of and evaluate if present -hadoop3 variant
> artifacts.
>
>
> > On Aug 25, 2022, at 10:40 AM, Andrew Purtell <[email protected]>
> wrote:
> >
> > Thanks, that would work.
> >
> >> On Aug 25, 2022, at 11:35 AM, Sean Busbey <[email protected]> wrote:
> >>
> >> yes, the flatten plugin. We use it in hbase-connectors already.
> >>
> >> https://www.mojohaus.org/flatten-maven-plugin/
> >>
> >> this sounds like it could also be a use case for BOMs, which would also
> >> benefit users of our client artifacts that use build tools that don't
> >> respect maven profiles generally, like gradle.
> >>
> >>> On Thu, Aug 25, 2022 at 10:30 AM Andrew Purtell <
> [email protected]>
> >>> wrote:
> >>>
> >>> Thinking about this a bit more, we will have an issue in that the POMs
> >>> published from our -hadoop3 build will not have a default activation
> of our
> >>> Hadoop 3 build profile. The convenience binaries will function as
> expected
> >>> but Maven will read and process eg Phoenix POMs, then download and
> perform
> >>> substitutions on HBase POMs, and then etc, so downstreamers like
> Phoenix
> >>> will have to set up the hadoop.profile variable for us in their default
> >>> build profile or else the transitive paths through us may be wrong. I
> >>> wonder if there is a Maven plugin available for deploying POMs with all
> >>> variable substitutions performed before deployment, that would solve
> that
> >>> problem and all conceivable related issues.
> >>>
> >>>> On Aug 25, 2022, at 11:03 AM, Andrew Purtell <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> I think 2.x is going to have a few years of life remaining so it
> would
> >>> be best, if we are going to address this, to have a 2.x solution was
> well
> >>> as a 3.x one.
> >>>>
> >>>> In my opinion we can continue to publish 2.4 and 2.5 (and 2.6)
> unchanged
> >>> and then also introduce a Hadoop 3 release using “hadoop3” or similar
> as
> >>> Maven classifier. Phoenix could specify this classifier in their POMs.
> >>> Everyone should be happy. Users who already are comfortable with the
> Hadoop
> >>> 2 default don’t have to change anything. A one time POM change on the
> >>> Phoenix side is required but that’s it.
> >>>>
> >>>> The additional build time complexity for generating two releases can
> be
> >>> incorporated into create-release. Nobody does manual releases any more
> as
> >>> far as I know. Likewise, download and verification of -hadoop3
> convenience
> >>> binaries can be added to hbase-vote. I believe we are all using that
> tool
> >>> for verification of releases now. After these one time changes are
> landed
> >>> the cost for RMs and PMC will be only in a roughly doubled amount of
> time
> >>> needed to build and verify releases.
> >>>>
> >>>>>> On Aug 17, 2022, at 9:06 AM, Nick Dimiduk <[email protected]>
> wrote:
> >>>>>>
> >>>>>> Hi Geoffrey,
> >>>>>>
> >>>>>> I have no complaints with shipping convenience binaries built
> against
> >>> both
> >>>>> Hadoop2 and Hadoop3. The primary challenge is implementing the
> >>>>> necessary build changes, the secondary challenge is
> verifying/testing it
> >>>>> works reliably.
> >>>>>
> >>>>> But for Phoenix, are you asking for convenience binaries, or are you
> >>> asking
> >>>>> for artifacts published into maven that have the Hadoop3 profile
> >>> activated
> >>>>> and specify the associated dependencies?
> >>>>>
> >>>>> I'm afraid that the 2.5.0 release ship has already sailed. I've heard
> >>> talk
> >>>>> of a 2.6 "fast-follow", so maybe someone can have the build changes
> >>> ready
> >>>>> for that? Also, isn't this a too little, too late situation?
> Shouldn't
> >>> we
> >>>>> shift our focus to releasing 3.0, which has dropped support for
> Hadoop2?
> >>>>>
> >>>>> Thanks,
> >>>>> Nick
> >>>>>
> >>>>>>> On Tue, Aug 16, 2022 at 9:30 PM Geoffrey Jacoby <
> [email protected]>
> >>> wrote:
> >>>>>>
> >>>>>> I see that the next HBase 2.5 RC is imminent, and before that's set
> in
> >>>>>> stone, I wanted to bring up the question of whether there will be
> >>> official
> >>>>>> HBase 2.5 binaries built with the Hadoop 3 profile and available in
> the
> >>>>>> usual Maven repositories. (In addition to the usual Hadoop 2 profile
> >>>>>> binaries)
> >>>>>>
> >>>>>> The HBase 2.x line has a commitment to maintain support for Hadoop
> >>> 2.x, but
> >>>>>> Hadoop 3.3 is the current stable Hadoop line and the most recent
> >>> release
> >>>>>> notes [1] encourage all users of Hadoop  2.x to upgrade to Hadoop 3.
> >>>>>>
> >>>>>> Without convenience artifacts built against Hadoop 3, no end-users
> with
> >>>>>> Hadoop 3 clusters will be able to use the Apache-distributed
> binaries
> >>> and
> >>>>>> will instead have to recompile HBase from source themselves, or use
> a
> >>> 3rd
> >>>>>> party distribution that does so for them.
> >>>>>>
> >>>>>> This is especially inconvenient for downstream projects such as
> Apache
> >>>>>> Phoenix, which has never  officially supported the HBase 2.x /
> Hadoop
> >>> 2.10
> >>>>>> combination. (It currently supports only HBase 2.3 or 2.4 with
> Hadoop
> >>> 3.
> >>>>>> HBase 2.5 support will be added very shortly after its release as
> part
> >>> of
> >>>>>> Phoenix 5.2.)
> >>>>>>
> >>>>>> To even run the Phoenix IT tests locally requires contributors to
> >>> download
> >>>>>> the HBase source release and manually mvn install to their local
> maven
> >>> repo
> >>>>>> using the Hadoop 3 profile, to avoid crashes in the HBase
> >>> minicluster.[2]
> >>>>>> This is a barrier to new contributors and confuses even veteran
> ones,
> >>> and
> >>>>>> has to be done again for every new HBase release.
> >>>>>>
> >>>>>> In general, I expect the Hadoop 3 user base to grow and the Hadoop
> 2.10
> >>>>>> user base to shrink with every future HBase 2 release, so I think
> this
> >>> is a
> >>>>>> worthwhile improvement.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Geoffrey
> >>>>>>
> >>>>>> [1] https://hadoop.apache.org/release/3.3.4.html
> >>>>>> [2] https://github.com/apache/phoenix/blob/master/BUILDING.md
> >>>>>>
> >>>
>

Reply via email to