steveloughran commented on PR #4491:
URL: https://github.com/apache/hadoop/pull/4491#issuecomment-1164417288

   (you are going to hate me here. sorry)
   
   First, please let's not have "update a few dependency" patches. Is it not a 
useful title and by updating multiple dependencies simultaneously makes it a 
lot harder to identify problems through git bisect and makes the changes harder 
roll back and cherry pick.
   
   Second, we must not have anything in this release which isn't already in 
branch 3.3 and so has been stabilising there in the uses other developers have 
been making of that branch.
   
   Finally, I am scared of any- and all- last minute updates of dependences as 
the blast radius of a change of a few digits in a number in a POM file can have 
dramatic impact on a project two hops away.
   
   That's why I believe the default decision on any last minute dependency 
update should be "no". This is worth bearing in mind as I intend to share 
release manager responsibilities with Mukund on the branch-3.3 feature release 
this summer, and refusing last-minute changes is going to be my default action, 
especially when it comes to jar updates. Get those changes in and stabilising 
now!
   
   
   ## jetty
   
   -1 to jetty update because I'm scared of what will break. the hadoop.next 
release will upgrade to jetty 2 and shade it. 
   
   ## Htrace
   
   -1 to htrace as it was fixed in this branch by #3520
   
   ```
   9e2936f8d1f HADOOP-17424. Replace HTrace with No-Op tracer (#3520)
   ```
   
   If this is not the case then we have a serious issue which needs to be fixed 
across all the recent branches. file a critical hadoop JIRA and we can go from 
there.
   
   ## Zookeeper
   
   -1 until/unless in branch-3.3
   
   Interesting one there. trunk is on 3.6.3 after HADOOP-17612. Upgrade 
Zookeeper to 3.6.3 and Curator to 5.2.0 #3241
   
   For any change there, an increment on 3.5.x is lower risk and may not need a 
matching curator increment, but that'd still need qualification
   
   for the branch-3.3 release, why don't we cherrypick #3241 and followons?
   
   ## AWS SDK
   
   -1 to updating the AWS SDK except as a standalone cherrypick of our 
branch-3.3 patch #3864 with full requalification
   
   ```
   d8ab84275e0 - HADOOP-18068. upgrade AWS SDK to 1.12.132 (#3864)
   ```
   
   The SDK is covered in HADOOP-18068; any back porting should just be a 
cherrypick. But as with most is AWS SDK updates it caused a regression 
(HADOOP-18085). Anyone proposing it as a backport has to 
   1. Run the full hadoop-aws integration test suite with `-Dscale` and declare 
which endpoint they ran against.
   2. look at the section "Qualifying an AWS SDK Update" and treat the 
instructions there as a MUST not a MAY 
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/testing.html#Qualifying_an_AWS_SDK_Update
   3. note that instruction 1 there is "Don’t make this a last minute action."
   
   I have encountered other cases where people have been updating this SDK 
dependency without raising it with me. Yes, tools do highlight Jackson 
serialisation issues which exist in the shaded Jackson dependency. However, the 
AWS STK does not use those bits of Jackson. And, because nothing else uses 
those bits of Jackson in this library precisely because they are shaded, the 
risk is not actually manifest in the S3A connector. Given this fact and the 
qualification process I don't want to include it. 
   
   If you really want this in, create a single PR cherry picking HADOOP-18068, 
and all follow-on fixes which are applicable to this branch, say which AWS 
endpoint you ran the hadoop-aws test suites against. And do the entire SDK 
update qualification covered in the testing doc. I will then merge the chain of 
commits in one by one
   
   This should be safe because we have actually been using this in branch 3.3+ 
and other than the regression in tests there have been no adverse consequences. 
It MUST be the exact version we have been using (1.12.132) as no later release 
has been validated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to