How do you manage and version such dependency upgrades in subminor Haoop/Spark/Hive versions in Cloudera then? I would imagine that some upgrades will be breaking for customers and can not be shipped in subminor CDH release? Or this is in preparation for the next major/minor release of CDH?
On Wed, Mar 11, 2020 at 5:45 PM Wei-Chiu Chuang <weic...@cloudera.com.invalid> wrote: > FWIW we are updating guava in Spark and Hive at Cloudera. Don't know which > Apache version are they going to land, but we'll upstream them for sure. > > The guava change is debatable. It's not as critical as others. There are > critical vulnerabilities in other dependencies that we have no way but to > update to a new major/minor version because we are so far behind. And given > the critical nature, I think it is worth the risk and backport to lower > maintenance releases is warranted. Moreover, our minor releases are at best > 1 per year. That is too slow to respond to a critical vulnerability. > > On Wed, Mar 11, 2020 at 5:02 PM Igor Dvorzhak <i...@google.com.invalid> > wrote: > > > Generally I'm for updating dependencies, but I think that Hadoop should > > stick with semantic versioning and do not make major and > > minor dependency updates in subminor releases. > > > > For example, Hadoop 3.2.1 updated Guava to 27.0-jre, and because of this > > Spark 3.0 stuck with Hadoop 3.2.0 - they use Hive 2.3.6 that doesn't > > support Guava 27.0-jre. > > > > It would be better to make dependency upgrades when releasing new > > major/minor versions, for example Guava 27.0-jre upgrade was more > > appropriate for Hadoop 3.3.0 release than 3.2.1. > > > > On Tue, Mar 10, 2020 at 3:03 PM Wei-Chiu Chuang > > <weic...@cloudera.com.invalid> wrote: > > > >> I'm not hearing any feedback so far, but I want to suggest: > >> > >> use hadoop-thirdparty repository to host any dependencies that are known > >> to > >> break compatibility. > >> > >> Candidate #1 guava > >> Candidate #2 Netty > >> Candidate #3 Jetty > >> > >> in fact, HBase shades these dependencies for the exact same reason. > >> > >> As an example of the cost of compatibility breakage: we spent the last 6 > >> months to backport the guava update change (guava 11 --> 27) throughout > >> Cloudera's stack, and after 6 months we are not done yet because we have > >> to > >> update guava in Hadoop, Hive, Spark ..., and Hadoop, Hive and Spark's > >> guava > >> is in the classpath of every application. > >> > >> Thoughts? > >> > >> On Sat, Mar 7, 2020 at 9:31 AM Wei-Chiu Chuang <weic...@apache.org> > >> wrote: > >> > >> > Hi Hadoop devs, > >> > > >> > I the past, Hadoop tends to be pretty far behind the latest versions > of > >> > dependencies. Part of that is due to the fear of the breaking changes > >> > brought in by the dependency updates. > >> > > >> > However, things have changed dramatically over the past few years. > With > >> > more focus on security vulnerabilities, more vulnerabilities are > >> discovered > >> > in our dependencies, and users put more pressure on patching Hadoop > (and > >> > its ecosystem) to use the latest dependency versions. > >> > > >> > As an example, Jackson-databind had 20 CVEs published in the last year > >> > alone. > >> > > >> > https://www.cvedetails.com/product/42991/Fasterxml-Jackson-databind.html?vendor_id=15866 > >> > > >> > Jetty: 4 CVEs in 2019: > >> > > >> > https://www.cvedetails.com/product/34824/Eclipse-Jetty.html?vendor_id=10410 > >> > > >> > We can no longer keep Hadoop stay behind. The more we stay behind, the > >> > harder it is to update. A good example is Jersey migration 1 -> 2 > >> > HADOOP-15984 <https://issues.apache.org/jira/browse/HADOOP-15984> > >> contributed > >> > by Akira. Jersey 1 is no longer supported. But Jersey 2 migration is > >> hard. > >> > If any critical vulnerability is found in Jersey 1, it will leave us > in > >> a > >> > bad situation since we can't simply update Jersey version and be done. > >> > > >> > Hadoop 3 adds new public artifacts that shade these dependencies. We > >> > should advocate downstream applications to use the public artifacts to > >> > avoid breakage. > >> > > >> > I'd like to hear your thoughts: are you okay to see Hadoop keep up > with > >> > the latest dependency updates, or would rather stay behind to ensure > >> > compatibility? > >> > > >> > Coupled with that, I'd like to call for more frequent Hadoop releases > >> for > >> > the same purpose. IMHO that'll require better infrastructure to assist > >> the > >> > release work and some rethinking our current Hadoop code structure, > like > >> > separate each subproject into its own repository and release cadence. > >> This > >> > can be controversial but I think it'll be good for the project in the > >> long > >> > run. > >> > > >> > Thanks, > >> > Wei-Chiu > >> > > >> > > >
smime.p7s
Description: S/MIME Cryptographic Signature