The security posture of Hadoop 2 in general is a problem, because maintenance 
on that branch is spotty, that is just how it goes. We had the same situation 
with our now EOL branch-1. I know Hadoop released 2.10.2 to address some CVE 
worthy problems but it is unclear if 2.10.2 addresses all known issues, unlike 
3.3.4. Also as you know Hadoop 2 has unpatchable dependencies on org.codehaus 
versions of Jackson and Jetty, which themselves have high scoring CVEs that 
will never be fixed because they are EOL, and other similar issues. Hadoop 3 
doesn’t completely solve such problems but is the only realistic place we can 
hope they can be addressed as required. For organizations that implement or 
require a top to bottom security audit of their software bill of materials, it 
seems possible to avoid user pain by providing supported convenience artifacts 
*and* libraries built against Hadoop 3 APIs in the Apache repository 
addressable with a Maven classifier. 

My employer has some interests in this area that align so I would like to 
sponsor (implement, review, commit, RM backfill releases, etc.) this work. 
Would there be any objections? Read through the thread for some thoughts on 
approach. Summarized:

- Amend create-release to build, stage, and deploy a -hadoop3 variant build by 
activating the Hadoop 3 build profile. 

- Amend the Hadoop 3 build profile to flatten POMs before deployment to resolve 
potential downstream issues due to Hadoop 3 being a non-default build profile. 
(This could also be applied to all builds.)

- Amend hbase-vote to be aware of and evaluate if present -hadoop3 variant 
artifacts. 


> On Aug 25, 2022, at 10:40 AM, Andrew Purtell <andrew.purt...@gmail.com> wrote:
> 
> Thanks, that would work. 
> 
>> On Aug 25, 2022, at 11:35 AM, Sean Busbey <bus...@apache.org> wrote:
>> 
>> yes, the flatten plugin. We use it in hbase-connectors already.
>> 
>> https://www.mojohaus.org/flatten-maven-plugin/
>> 
>> this sounds like it could also be a use case for BOMs, which would also
>> benefit users of our client artifacts that use build tools that don't
>> respect maven profiles generally, like gradle.
>> 
>>> On Thu, Aug 25, 2022 at 10:30 AM Andrew Purtell <andrew.purt...@gmail.com>
>>> wrote:
>>> 
>>> Thinking about this a bit more, we will have an issue in that the POMs
>>> published from our -hadoop3 build will not have a default activation of our
>>> Hadoop 3 build profile. The convenience binaries will function as expected
>>> but Maven will read and process eg Phoenix POMs, then download and perform
>>> substitutions on HBase POMs, and then etc, so downstreamers like Phoenix
>>> will have to set up the hadoop.profile variable for us in their default
>>> build profile or else the transitive paths through us may be wrong. I
>>> wonder if there is a Maven plugin available for deploying POMs with all
>>> variable substitutions performed before deployment, that would solve that
>>> problem and all conceivable related issues.
>>> 
>>>> On Aug 25, 2022, at 11:03 AM, Andrew Purtell <andrew.purt...@gmail.com>
>>> wrote:
>>>> 
>>>> I think 2.x is going to have a few years of life remaining so it would
>>> be best, if we are going to address this, to have a 2.x solution was well
>>> as a 3.x one.
>>>> 
>>>> In my opinion we can continue to publish 2.4 and 2.5 (and 2.6) unchanged
>>> and then also introduce a Hadoop 3 release using “hadoop3” or similar as
>>> Maven classifier. Phoenix could specify this classifier in their POMs.
>>> Everyone should be happy. Users who already are comfortable with the Hadoop
>>> 2 default don’t have to change anything. A one time POM change on the
>>> Phoenix side is required but that’s it.
>>>> 
>>>> The additional build time complexity for generating two releases can be
>>> incorporated into create-release. Nobody does manual releases any more as
>>> far as I know. Likewise, download and verification of -hadoop3 convenience
>>> binaries can be added to hbase-vote. I believe we are all using that tool
>>> for verification of releases now. After these one time changes are landed
>>> the cost for RMs and PMC will be only in a roughly doubled amount of time
>>> needed to build and verify releases.
>>>> 
>>>>>> On Aug 17, 2022, at 9:06 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
>>>>>> 
>>>>>> Hi Geoffrey,
>>>>>> 
>>>>>> I have no complaints with shipping convenience binaries built against
>>> both
>>>>> Hadoop2 and Hadoop3. The primary challenge is implementing the
>>>>> necessary build changes, the secondary challenge is verifying/testing it
>>>>> works reliably.
>>>>> 
>>>>> But for Phoenix, are you asking for convenience binaries, or are you
>>> asking
>>>>> for artifacts published into maven that have the Hadoop3 profile
>>> activated
>>>>> and specify the associated dependencies?
>>>>> 
>>>>> I'm afraid that the 2.5.0 release ship has already sailed. I've heard
>>> talk
>>>>> of a 2.6 "fast-follow", so maybe someone can have the build changes
>>> ready
>>>>> for that? Also, isn't this a too little, too late situation? Shouldn't
>>> we
>>>>> shift our focus to releasing 3.0, which has dropped support for Hadoop2?
>>>>> 
>>>>> Thanks,
>>>>> Nick
>>>>> 
>>>>>>> On Tue, Aug 16, 2022 at 9:30 PM Geoffrey Jacoby <gjac...@apache.org>
>>> wrote:
>>>>>> 
>>>>>> I see that the next HBase 2.5 RC is imminent, and before that's set in
>>>>>> stone, I wanted to bring up the question of whether there will be
>>> official
>>>>>> HBase 2.5 binaries built with the Hadoop 3 profile and available in the
>>>>>> usual Maven repositories. (In addition to the usual Hadoop 2 profile
>>>>>> binaries)
>>>>>> 
>>>>>> The HBase 2.x line has a commitment to maintain support for Hadoop
>>> 2.x, but
>>>>>> Hadoop 3.3 is the current stable Hadoop line and the most recent
>>> release
>>>>>> notes [1] encourage all users of Hadoop  2.x to upgrade to Hadoop 3.
>>>>>> 
>>>>>> Without convenience artifacts built against Hadoop 3, no end-users with
>>>>>> Hadoop 3 clusters will be able to use the Apache-distributed binaries
>>> and
>>>>>> will instead have to recompile HBase from source themselves, or use a
>>> 3rd
>>>>>> party distribution that does so for them.
>>>>>> 
>>>>>> This is especially inconvenient for downstream projects such as Apache
>>>>>> Phoenix, which has never  officially supported the HBase 2.x / Hadoop
>>> 2.10
>>>>>> combination. (It currently supports only HBase 2.3 or 2.4 with Hadoop
>>> 3.
>>>>>> HBase 2.5 support will be added very shortly after its release as part
>>> of
>>>>>> Phoenix 5.2.)
>>>>>> 
>>>>>> To even run the Phoenix IT tests locally requires contributors to
>>> download
>>>>>> the HBase source release and manually mvn install to their local maven
>>> repo
>>>>>> using the Hadoop 3 profile, to avoid crashes in the HBase
>>> minicluster.[2]
>>>>>> This is a barrier to new contributors and confuses even veteran ones,
>>> and
>>>>>> has to be done again for every new HBase release.
>>>>>> 
>>>>>> In general, I expect the Hadoop 3 user base to grow and the Hadoop 2.10
>>>>>> user base to shrink with every future HBase 2 release, so I think this
>>> is a
>>>>>> worthwhile improvement.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Geoffrey
>>>>>> 
>>>>>> [1] https://hadoop.apache.org/release/3.3.4.html
>>>>>> [2] https://github.com/apache/phoenix/blob/master/BUILDING.md
>>>>>> 
>>> 

Reply via email to