Should we also make hadoop3 the default active profile for branch-2 going forward?
On Fri, Aug 26, 2022 at 5:25 PM Andrew Purtell <[email protected]> wrote: > The security posture of Hadoop 2 in general is a problem, because > maintenance on that branch is spotty, that is just how it goes. We had the > same situation with our now EOL branch-1. I know Hadoop released 2.10.2 to > address some CVE worthy problems but it is unclear if 2.10.2 addresses all > known issues, unlike 3.3.4. Also as you know Hadoop 2 has unpatchable > dependencies on org.codehaus versions of Jackson and Jetty, which > themselves have high scoring CVEs that will never be fixed because they are > EOL, and other similar issues. Hadoop 3 doesn’t completely solve such > problems but is the only realistic place we can hope they can be addressed > as required. For organizations that implement or require a top to bottom > security audit of their software bill of materials, it seems possible to > avoid user pain by providing supported convenience artifacts *and* > libraries built against Hadoop 3 APIs in the Apache repository addressable > with a Maven classifier. > > My employer has some interests in this area that align so I would like to > sponsor (implement, review, commit, RM backfill releases, etc.) this work. > Would there be any objections? Read through the thread for some thoughts on > approach. Summarized: > > - Amend create-release to build, stage, and deploy a -hadoop3 variant > build by activating the Hadoop 3 build profile. > > - Amend the Hadoop 3 build profile to flatten POMs before deployment to > resolve potential downstream issues due to Hadoop 3 being a non-default > build profile. (This could also be applied to all builds.) > > - Amend hbase-vote to be aware of and evaluate if present -hadoop3 variant > artifacts. > > > > On Aug 25, 2022, at 10:40 AM, Andrew Purtell <[email protected]> > wrote: > > > > Thanks, that would work. > > > >> On Aug 25, 2022, at 11:35 AM, Sean Busbey <[email protected]> wrote: > >> > >> yes, the flatten plugin. We use it in hbase-connectors already. > >> > >> https://www.mojohaus.org/flatten-maven-plugin/ > >> > >> this sounds like it could also be a use case for BOMs, which would also > >> benefit users of our client artifacts that use build tools that don't > >> respect maven profiles generally, like gradle. > >> > >>> On Thu, Aug 25, 2022 at 10:30 AM Andrew Purtell < > [email protected]> > >>> wrote: > >>> > >>> Thinking about this a bit more, we will have an issue in that the POMs > >>> published from our -hadoop3 build will not have a default activation > of our > >>> Hadoop 3 build profile. The convenience binaries will function as > expected > >>> but Maven will read and process eg Phoenix POMs, then download and > perform > >>> substitutions on HBase POMs, and then etc, so downstreamers like > Phoenix > >>> will have to set up the hadoop.profile variable for us in their default > >>> build profile or else the transitive paths through us may be wrong. I > >>> wonder if there is a Maven plugin available for deploying POMs with all > >>> variable substitutions performed before deployment, that would solve > that > >>> problem and all conceivable related issues. > >>> > >>>> On Aug 25, 2022, at 11:03 AM, Andrew Purtell < > [email protected]> > >>> wrote: > >>>> > >>>> I think 2.x is going to have a few years of life remaining so it > would > >>> be best, if we are going to address this, to have a 2.x solution was > well > >>> as a 3.x one. > >>>> > >>>> In my opinion we can continue to publish 2.4 and 2.5 (and 2.6) > unchanged > >>> and then also introduce a Hadoop 3 release using “hadoop3” or similar > as > >>> Maven classifier. Phoenix could specify this classifier in their POMs. > >>> Everyone should be happy. Users who already are comfortable with the > Hadoop > >>> 2 default don’t have to change anything. A one time POM change on the > >>> Phoenix side is required but that’s it. > >>>> > >>>> The additional build time complexity for generating two releases can > be > >>> incorporated into create-release. Nobody does manual releases any more > as > >>> far as I know. Likewise, download and verification of -hadoop3 > convenience > >>> binaries can be added to hbase-vote. I believe we are all using that > tool > >>> for verification of releases now. After these one time changes are > landed > >>> the cost for RMs and PMC will be only in a roughly doubled amount of > time > >>> needed to build and verify releases. > >>>> > >>>>>> On Aug 17, 2022, at 9:06 AM, Nick Dimiduk <[email protected]> > wrote: > >>>>>> > >>>>>> Hi Geoffrey, > >>>>>> > >>>>>> I have no complaints with shipping convenience binaries built > against > >>> both > >>>>> Hadoop2 and Hadoop3. The primary challenge is implementing the > >>>>> necessary build changes, the secondary challenge is > verifying/testing it > >>>>> works reliably. > >>>>> > >>>>> But for Phoenix, are you asking for convenience binaries, or are you > >>> asking > >>>>> for artifacts published into maven that have the Hadoop3 profile > >>> activated > >>>>> and specify the associated dependencies? > >>>>> > >>>>> I'm afraid that the 2.5.0 release ship has already sailed. I've heard > >>> talk > >>>>> of a 2.6 "fast-follow", so maybe someone can have the build changes > >>> ready > >>>>> for that? Also, isn't this a too little, too late situation? > Shouldn't > >>> we > >>>>> shift our focus to releasing 3.0, which has dropped support for > Hadoop2? > >>>>> > >>>>> Thanks, > >>>>> Nick > >>>>> > >>>>>>> On Tue, Aug 16, 2022 at 9:30 PM Geoffrey Jacoby < > [email protected]> > >>> wrote: > >>>>>> > >>>>>> I see that the next HBase 2.5 RC is imminent, and before that's set > in > >>>>>> stone, I wanted to bring up the question of whether there will be > >>> official > >>>>>> HBase 2.5 binaries built with the Hadoop 3 profile and available in > the > >>>>>> usual Maven repositories. (In addition to the usual Hadoop 2 profile > >>>>>> binaries) > >>>>>> > >>>>>> The HBase 2.x line has a commitment to maintain support for Hadoop > >>> 2.x, but > >>>>>> Hadoop 3.3 is the current stable Hadoop line and the most recent > >>> release > >>>>>> notes [1] encourage all users of Hadoop 2.x to upgrade to Hadoop 3. > >>>>>> > >>>>>> Without convenience artifacts built against Hadoop 3, no end-users > with > >>>>>> Hadoop 3 clusters will be able to use the Apache-distributed > binaries > >>> and > >>>>>> will instead have to recompile HBase from source themselves, or use > a > >>> 3rd > >>>>>> party distribution that does so for them. > >>>>>> > >>>>>> This is especially inconvenient for downstream projects such as > Apache > >>>>>> Phoenix, which has never officially supported the HBase 2.x / > Hadoop > >>> 2.10 > >>>>>> combination. (It currently supports only HBase 2.3 or 2.4 with > Hadoop > >>> 3. > >>>>>> HBase 2.5 support will be added very shortly after its release as > part > >>> of > >>>>>> Phoenix 5.2.) > >>>>>> > >>>>>> To even run the Phoenix IT tests locally requires contributors to > >>> download > >>>>>> the HBase source release and manually mvn install to their local > maven > >>> repo > >>>>>> using the Hadoop 3 profile, to avoid crashes in the HBase > >>> minicluster.[2] > >>>>>> This is a barrier to new contributors and confuses even veteran > ones, > >>> and > >>>>>> has to be done again for every new HBase release. > >>>>>> > >>>>>> In general, I expect the Hadoop 3 user base to grow and the Hadoop > 2.10 > >>>>>> user base to shrink with every future HBase 2 release, so I think > this > >>> is a > >>>>>> worthwhile improvement. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Geoffrey > >>>>>> > >>>>>> [1] https://hadoop.apache.org/release/3.3.4.html > >>>>>> [2] https://github.com/apache/phoenix/blob/master/BUILDING.md > >>>>>> > >>> >
