While we're on this subject, there are two more dependency updates that we could consider including in 2.4.2 on the same grounds, as they're dependencies with CVEs. However, it's not clear whether the CVEs actually affect Spark. These are already in master.
https://issues.apache.org/jira/browse/SPARK-27469 https://issues.apache.org/jira/browse/SPARK-27470 On Fri, Apr 19, 2019 at 11:13 AM Sean Owen <sro...@gmail.com> wrote: > > All: here is the backport of changes to update to 2.9.8 from master back to > 2.4. > https://github.com/apache/spark/pull/24418 > > master has been on 2.9.8 for a while, so the concern isnt' Spark so > much. It's that user apps would face the same change of behavior if > they used Jackson in a similar way. I'm moderately in favor of > updating just as it's come up several times, but it's debatable. > > On Fri, Apr 19, 2019 at 11:01 AM Alessandro Bellina > <abell...@verizonmedia.com> wrote: > > > > We ran unit tests locally with the patch in > > https://github.com/apache/spark/pull/21596 on top of our 2.4 branch, so I > > think that branch-2.4 should pass that part. > > > > As far dependencies, things get murkier. I went through the dependencies > > where we have exclude clauses for databind. Here is the summary of those > > versions that were different than 2.9.6 (i.e. kafka-0.10 is using 2.9.6, so > > not listed below): > > > > hadoop is using jackson version 2.2.3 > > org.json4s moved to jackson.version 2.9.8, perhaps we should move to 2.9.8 > > instead? (though it should be compatible with 2.9.6) > > calcite 1.2.0-incubating uses jackson version 2.1.1, but it has moved to > > 2.9.6 in more recent versions. > > arrow 0.10.0 uses jackson version 2.7.9, but moved to 2.9.8 in version > > apache-arrow-0.13.0. > > io.fabric8 3.0+ uses jackson version 2.7.4. > > > > The risk to staying with the older version is in user code (or other > > libraries) not doing proper input data validation. The risk of moving up is > > we can introduce instability as there are a lot of moving parts. If we can > > think of a set of tests that can be done here, either manually or > > automated, I think that would be worthwhile to contribute. > > > > On Thu, Apr 18, 2019 at 9:53 PM Wenchen Fan <cloud0...@gmail.com> wrote: > >> > >> I've cut RC1. If people think we must upgrade Jackson in 2.4, I can cut > >> RC2 shortly. > >> > >> Thanks, > >> Wenchen > >> > >> On Fri, Apr 19, 2019 at 3:32 AM Felix Cheung <felixcheun...@hotmail.com> > >> wrote: > >>> > >>> Re shading - same argument I’ve made earlier today in a PR... > >>> > >>> (Context- in many cases Spark has light or indirect dependencies but > >>> bringing them into the process breaks users code easily) > >>> > >>> > >>> ________________________________ > >>> From: Michael Heuer <heue...@gmail.com> > >>> Sent: Thursday, April 18, 2019 6:41 AM > >>> To: Reynold Xin > >>> Cc: Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen Fan; > >>> Xiao Li > >>> Subject: Re: Spark 2.4.2 > >>> > >>> +100 > >>> > >>> > >>> On Apr 18, 2019, at 1:48 AM, Reynold Xin <r...@databricks.com> wrote: > >>> > >>> We should have shaded all Spark’s dependencies :( > >>> > >>> On Wed, Apr 17, 2019 at 11:47 PM Sean Owen <sro...@gmail.com> wrote: > >>>> > >>>> For users that would inherit Jackson and use it directly, or whose > >>>> dependencies do. Spark itself (with modifications) should be OK with > >>>> the change. > >>>> It's risky and normally wouldn't backport, except that I've heard a > >>>> few times about concerns about CVEs affecting Databind, so wondering > >>>> who else out there might have an opinion. I'm not pushing for it > >>>> necessarily. > >>>> > >>>> On Wed, Apr 17, 2019 at 6:18 PM Reynold Xin <r...@databricks.com> wrote: > >>>> > > >>>> > For Jackson - are you worrying about JSON parsing for users or > >>>> > internal Spark functionality breaking? > >>>> > > >>>> > On Wed, Apr 17, 2019 at 6:02 PM Sean Owen <sro...@gmail.com> wrote: > >>>> >> > >>>> >> There's only one other item on my radar, which is considering updating > >>>> >> Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up > >>>> >> a few times now that there are a number of CVEs open for 2.6.7. Cons: > >>>> >> not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson > >>>> >> behavior non-trivially. That said back-porting the update PR to 2.4 > >>>> >> worked out OK locally. Any strong opinions on this one? > >>>> >> > >>>> >> On Wed, Apr 17, 2019 at 7:49 PM Wenchen Fan <cloud0...@gmail.com> > >>>> >> wrote: > >>>> >> > > >>>> >> > I volunteer to be the release manager for 2.4.2, as I was also > >>>> >> > going to propose 2.4.2 because of the reverting of SPARK-25250. Is > >>>> >> > there any other ongoing bug fixes we want to include in 2.4.2? If > >>>> >> > no I'd like to start the release process today (CST). > >>>> >> > > >>>> >> > Thanks, > >>>> >> > Wenchen > >>>> >> > > >>>> >> > On Thu, Apr 18, 2019 at 3:44 AM Sean Owen <sro...@gmail.com> wrote: > >>>> >> >> > >>>> >> >> I think the 'only backport bug fixes to branches' principle > >>>> >> >> remains sound. But what's a bug fix? Something that changes > >>>> >> >> behavior to match what is explicitly supposed to happen, or > >>>> >> >> implicitly supposed to happen -- implied by what other similar > >>>> >> >> things do, by reasonable user expectations, or simply how it > >>>> >> >> worked previously. > >>>> >> >> > >>>> >> >> Is this a bug fix? I guess the criteria that matches is that > >>>> >> >> behavior doesn't match reasonable user expectations? I don't know > >>>> >> >> enough to have a strong opinion. I also don't think there is > >>>> >> >> currently an objection to backporting it, whatever it's called. > >>>> >> >> > >>>> >> >> > >>>> >> >> Is the question whether this needs a new release? There's no harm > >>>> >> >> in another point release, other than needing a volunteer release > >>>> >> >> manager. One could say, wait a bit longer to see what more info > >>>> >> >> comes in about 2.4.1. But given that 2.4.1 took like 2 months, > >>>> >> >> it's reasonable to move towards a release cycle again. I don't see > >>>> >> >> objection to that either (?) > >>>> >> >> > >>>> >> >> > >>>> >> >> The meta question remains: is a 'bug fix' definition even agreed, > >>>> >> >> and being consistently applied? There aren't correct answers, only > >>>> >> >> best guesses from each person's own experience, judgment and > >>>> >> >> priorities. These can differ even when applied in good faith. > >>>> >> >> > >>>> >> >> Sometimes the variance of opinion comes because people have > >>>> >> >> different info that needs to be surfaced. Here, maybe it's best to > >>>> >> >> share what about that offline conversation was convincing, for > >>>> >> >> example. > >>>> >> >> > >>>> >> >> I'd say it's also important to separate what one would prefer from > >>>> >> >> what one can't live with(out). Assuming one trusts the intent and > >>>> >> >> experience of the handful of others with an opinion, I'd defer to > >>>> >> >> someone who wants X and will own it, even if I'm moderately > >>>> >> >> against it. Otherwise we'd get little done. > >>>> >> >> > >>>> >> >> In that light, it seems like both of the PRs at issue here are not > >>>> >> >> _wrong_ to backport. This is a good pair that highlights why, when > >>>> >> >> there isn't a clear reason to do / not do something (e.g. obvious > >>>> >> >> errors, breaking public APIs) we give benefit-of-the-doubt in > >>>> >> >> order to get it later. > >>>> >> >> > >>>> >> >> > >>>> >> >> On Wed, Apr 17, 2019 at 12:09 PM Ryan Blue > >>>> >> >> <rb...@netflix.com.invalid> wrote: > >>>> >> >>> > >>>> >> >>> Sorry, I should be more clear about what I'm trying to say here. > >>>> >> >>> > >>>> >> >>> In the past, Xiao has taken the opposite stance. A good example > >>>> >> >>> is PR #21060 that was a very similar situation: behavior didn't > >>>> >> >>> match what was expected and there was low risk. There was a long > >>>> >> >>> argument and the patch didn't make it into 2.3 (to my knowledge). > >>>> >> >>> > >>>> >> >>> What we call these low-risk behavior fixes doesn't matter. I > >>>> >> >>> called it a bug on #21060 but I'm applying Xiao's previous > >>>> >> >>> definition here to make a point. Whatever term we use, we clearly > >>>> >> >>> have times when we want to allow a patch because it is low risk > >>>> >> >>> and helps someone. Let's just be clear that that's perfectly fine. > >>>> >> >>> > >>>> >> >>> On Wed, Apr 17, 2019 at 9:34 AM Ryan Blue <rb...@netflix.com> > >>>> >> >>> wrote: > >>>> >> >>>> > >>>> >> >>>> How is this a bug fix? > >>>> >> >>>> > >>>> >> >>>> On Wed, Apr 17, 2019 at 9:30 AM Xiao Li <lix...@databricks.com> > >>>> >> >>>> wrote: > >>>> >> >>>>> > >>>> >> >>>>> Michael and I had an offline discussion about this PR > >>>> >> >>>>> https://github.com/apache/spark/pull/24365. He convinced me > >>>> >> >>>>> that this is a bug fix. The code changes of this bug fix are > >>>> >> >>>>> very tiny and the risk is very low. To avoid any behavior > >>>> >> >>>>> change in the patch releases, this PR also added a legacy flag > >>>> >> >>>>> whose default value is off. > >>>> >> >>>>> > >>>> >> > >>>> >> --------------------------------------------------------------------- > >>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>> >> > >>> > >>> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org