Generally I'm for updating dependencies, but I think that Hadoop should stick with semantic versioning and do not make major and minor dependency updates in subminor releases.
For example, Hadoop 3.2.1 updated Guava to 27.0-jre, and because of this Spark 3.0 stuck with Hadoop 3.2.0 - they use Hive 2.3.6 that doesn't support Guava 27.0-jre. It would be better to make dependency upgrades when releasing new major/minor versions, for example Guava 27.0-jre upgrade was more appropriate for Hadoop 3.3.0 release than 3.2.1. On Tue, Mar 10, 2020 at 3:03 PM Wei-Chiu Chuang <weic...@cloudera.com.invalid> wrote: > I'm not hearing any feedback so far, but I want to suggest: > > use hadoop-thirdparty repository to host any dependencies that are known to > break compatibility. > > Candidate #1 guava > Candidate #2 Netty > Candidate #3 Jetty > > in fact, HBase shades these dependencies for the exact same reason. > > As an example of the cost of compatibility breakage: we spent the last 6 > months to backport the guava update change (guava 11 --> 27) throughout > Cloudera's stack, and after 6 months we are not done yet because we have to > update guava in Hadoop, Hive, Spark ..., and Hadoop, Hive and Spark's guava > is in the classpath of every application. > > Thoughts? > > On Sat, Mar 7, 2020 at 9:31 AM Wei-Chiu Chuang <weic...@apache.org> wrote: > > > Hi Hadoop devs, > > > > I the past, Hadoop tends to be pretty far behind the latest versions of > > dependencies. Part of that is due to the fear of the breaking changes > > brought in by the dependency updates. > > > > However, things have changed dramatically over the past few years. With > > more focus on security vulnerabilities, more vulnerabilities are > discovered > > in our dependencies, and users put more pressure on patching Hadoop (and > > its ecosystem) to use the latest dependency versions. > > > > As an example, Jackson-databind had 20 CVEs published in the last year > > alone. > > > https://www.cvedetails.com/product/42991/Fasterxml-Jackson-databind.html?vendor_id=15866 > > > > Jetty: 4 CVEs in 2019: > > > https://www.cvedetails.com/product/34824/Eclipse-Jetty.html?vendor_id=10410 > > > > We can no longer keep Hadoop stay behind. The more we stay behind, the > > harder it is to update. A good example is Jersey migration 1 -> 2 > > HADOOP-15984 <https://issues.apache.org/jira/browse/HADOOP-15984> > contributed > > by Akira. Jersey 1 is no longer supported. But Jersey 2 migration is > hard. > > If any critical vulnerability is found in Jersey 1, it will leave us in a > > bad situation since we can't simply update Jersey version and be done. > > > > Hadoop 3 adds new public artifacts that shade these dependencies. We > > should advocate downstream applications to use the public artifacts to > > avoid breakage. > > > > I'd like to hear your thoughts: are you okay to see Hadoop keep up with > > the latest dependency updates, or would rather stay behind to ensure > > compatibility? > > > > Coupled with that, I'd like to call for more frequent Hadoop releases for > > the same purpose. IMHO that'll require better infrastructure to assist > the > > release work and some rethinking our current Hadoop code structure, like > > separate each subproject into its own repository and release cadence. > This > > can be controversial but I think it'll be good for the project in the > long > > run. > > > > Thanks, > > Wei-Chiu > > >
smime.p7s
Description: S/MIME Cryptographic Signature