Greetings, I would also like to ask if the following ticket could make it in to 2.3.0. I’m currently testing the code in production as we were running into issues on non-compacted topics (very occasionally) running into non-consecutive offsets. I imagine other people will encounter similar issues if they’re doing 15+ billion records a day.
https://github.com/apache/spark/pull/20572 <https://github.com/apache/spark/pull/20572> (SPARK-17147) Thanks, Justin > On Feb 21, 2018, at 10:21 AM, kant kodali <kanth...@gmail.com> wrote: > > Hi All, > > +1 for the tickets proposed by Ryan Blue > > Any possible chance of this one > https://issues.apache.org/jira/browse/SPARK-23406 > <https://issues.apache.org/jira/browse/SPARK-23406> getting into 2.3.0? It's > a very important feature for us so if it doesn't make the cut I would have to > cherry-pick this commit and compile from the source for our production > release. > > Thanks! > > On Wed, Feb 21, 2018 at 9:01 AM, Ryan Blue <rb...@netflix.com.invalid > <mailto:rb...@netflix.com.invalid>> wrote: > What does everyone think about getting some of the newer DataSourceV2 > improvements in? It should be low risk because it is a new code path, and v2 > isn't very usable without things like support for using the output commit > coordinator to deconflict writes. > > The ones I'd like to get in are: > * Use the output commit coordinator: > https://issues.apache.org/jira/browse/SPARK-23323 > <https://issues.apache.org/jira/browse/SPARK-23323> > * Use immutable trees and the same push-down logic as other read paths: > https://issues.apache.org/jira/browse/SPARK-23203 > <https://issues.apache.org/jira/browse/SPARK-23203> > * Don't allow users to supply schemas when they aren't supported: > https://issues.apache.org/jira/browse/SPARK-23418 > <https://issues.apache.org/jira/browse/SPARK-23418> > > I think it would make the 2.3.0 release more usable for anyone interested in > the v2 read and write paths. > > Thanks! > > On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu <weichen...@databricks.com > <mailto:weichen...@databricks.com>> wrote: > +1 > > On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin <van...@cloudera.com > <mailto:van...@cloudera.com>> wrote: > Done, thanks! > > On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <samee...@apache.org > <mailto:samee...@apache.org>> wrote: > > Sure, please feel free to backport. > > > > On 20 February 2018 at 18:02, Marcelo Vanzin <van...@cloudera.com > > <mailto:van...@cloudera.com>> wrote: > >> > >> Hey Sameer, > >> > >> Mind including https://github.com/apache/spark/pull/20643 > >> <https://github.com/apache/spark/pull/20643> > >> (SPARK-23468) in the new RC? It's a minor bug since I've only hit it > >> with older shuffle services, but it's pretty safe. > >> > >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <samee...@apache.org > >> <mailto:samee...@apache.org>> > >> wrote: > >> > This RC has failed due to > >> > https://issues.apache.org/jira/browse/SPARK-23470 > >> > <https://issues.apache.org/jira/browse/SPARK-23470>. > >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow > >> > up > >> > with an RC5 soon. > >> > > >> > On 20 February 2018 at 16:49, Ryan Blue <rb...@netflix.com > >> > <mailto:rb...@netflix.com>> wrote: > >> >> > >> >> +1 > >> >> > >> >> Build & tests look fine, checked signature and checksums for src > >> >> tarball. > >> >> > >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu > >> >> <shixi...@databricks.com <mailto:shixi...@databricks.com>> wrote: > >> >>> > >> >>> I'm -1 because of the UI regression > >> >>> https://issues.apache.org/jira/browse/SPARK-23470 > >> >>> <https://issues.apache.org/jira/browse/SPARK-23470> : the All Jobs page > >> >>> may be > >> >>> too slow and cause "read timeout" when there are lots of jobs and > >> >>> stages. > >> >>> This is one of the most important pages because when it's broken, it's > >> >>> pretty hard to use Spark Web UI. > >> >>> > >> >>> > >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <marcogaid...@gmail.com > >> >>> <mailto:marcogaid...@gmail.com>> > >> >>> wrote: > >> >>>> > >> >>>> +1 > >> >>>> > >> >>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon <gurwls...@gmail.com > >> >>>> <mailto:gurwls...@gmail.com>>: > >> >>>>> > >> >>>>> +1 too > >> >>>>> > >> >>>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <ues...@happy-camper.st > >> >>>>> <mailto:ues...@happy-camper.st>>: > >> >>>>>> > >> >>>>>> +1 > >> >>>>>> > >> >>>>>> > >> >>>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang > >> >>>>>> <jiangxb1...@gmail.com <mailto:jiangxb1...@gmail.com>> > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> +1 > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> Wenchen Fan <cloud0...@gmail.com > >> >>>>>>> <mailto:cloud0...@gmail.com>>于2018年2月20日 周二下午1:09写道: > >> >>>>>>>> > >> >>>>>>>> +1 > >> >>>>>>>> > >> >>>>>>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin > >> >>>>>>>> <r...@databricks.com <mailto:r...@databricks.com>> > >> >>>>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> +1 > >> >>>>>>>>> > >> >>>>>>>>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal > >> >>>>>>>>> <sameer.a...@gmail.com <mailto:sameer.a...@gmail.com>>, wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> this file shouldn't be included? > >> >>>>>>>>>> > >> >>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml > >> >>>>>>>>>> > >> >>>>>>>>>> <https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I've now deleted this file > >> >>>>>>>>> > >> >>>>>>>>>> From: Sameer Agarwal <sameer.a...@gmail.com > >> >>>>>>>>>> <mailto:sameer.a...@gmail.com>> > >> >>>>>>>>>> Sent: Saturday, February 17, 2018 1:43:39 PM > >> >>>>>>>>>> To: Sameer Agarwal > >> >>>>>>>>>> Cc: dev > >> >>>>>>>>>> Subject: Re: [VOTE] Spark 2.3.0 (RC4) > >> >>>>>>>>>> > >> >>>>>>>>>> I'll start with a +1 once again. > >> >>>>>>>>>> > >> >>>>>>>>>> All blockers reported against RC3 have been resolved and the > >> >>>>>>>>>> builds are healthy. > >> >>>>>>>>>> > >> >>>>>>>>>> On 17 February 2018 at 13:41, Sameer Agarwal > >> >>>>>>>>>> <samee...@apache.org <mailto:samee...@apache.org>> > >> >>>>>>>>>> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> Please vote on releasing the following candidate as Apache > >> >>>>>>>>>>> Spark > >> >>>>>>>>>>> version 2.3.0. The vote is open until Thursday February 22, > >> >>>>>>>>>>> 2018 at 8:00:00 > >> >>>>>>>>>>> am UTC and passes if a majority of at least 3 PMC +1 votes are > >> >>>>>>>>>>> cast. > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.3.0 > >> >>>>>>>>>>> > >> >>>>>>>>>>> [ ] -1 Do not release this package because ... > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> To learn more about Apache Spark, please see > >> >>>>>>>>>>> https://spark.apache.org/ <https://spark.apache.org/> > >> >>>>>>>>>>> > >> >>>>>>>>>>> The tag to be voted on is v2.3.0-rc4: > >> >>>>>>>>>>> https://github.com/apache/spark/tree/v2.3.0-rc4 > >> >>>>>>>>>>> <https://github.com/apache/spark/tree/v2.3.0-rc4> > >> >>>>>>>>>>> (44095cb65500739695b0324c177c19dfa1471472) > >> >>>>>>>>>>> > >> >>>>>>>>>>> List of JIRA tickets resolved in this release can be found > >> >>>>>>>>>>> here: > >> >>>>>>>>>>> > >> >>>>>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551 > >> >>>>>>>>>>> > >> >>>>>>>>>>> <https://issues.apache.org/jira/projects/SPARK/versions/12339551> > >> >>>>>>>>>>> > >> >>>>>>>>>>> The release files, including signatures, digests, etc. can be > >> >>>>>>>>>>> found at: > >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/ > >> >>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/> > >> >>>>>>>>>>> > >> >>>>>>>>>>> Release artifacts are signed with the following key: > >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS > >> >>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/spark/KEYS> > >> >>>>>>>>>>> > >> >>>>>>>>>>> The staging repository for this release can be found at: > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1265/ > >> >>>>>>>>>>> > >> >>>>>>>>>>> <https://repository.apache.org/content/repositories/orgapachespark-1265/> > >> >>>>>>>>>>> > >> >>>>>>>>>>> The documentation corresponding to this release can be found > >> >>>>>>>>>>> at: > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs/_site/index.html > >> >>>>>>>>>>> > >> >>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs/_site/index.html> > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> FAQ > >> >>>>>>>>>>> > >> >>>>>>>>>>> ======================================= > >> >>>>>>>>>>> What are the unresolved issues targeted for 2.3.0? > >> >>>>>>>>>>> ======================================= > >> >>>>>>>>>>> > >> >>>>>>>>>>> Please see https://s.apache.org/oXKi > >> >>>>>>>>>>> <https://s.apache.org/oXKi>. At the time of writing, > >> >>>>>>>>>>> there are currently no known release blockers. > >> >>>>>>>>>>> > >> >>>>>>>>>>> ========================= > >> >>>>>>>>>>> How can I help test this release? > >> >>>>>>>>>>> ========================= > >> >>>>>>>>>>> > >> >>>>>>>>>>> If you are a Spark user, you can help us test this release by > >> >>>>>>>>>>> taking an existing Spark workload and running on this release > >> >>>>>>>>>>> candidate, > >> >>>>>>>>>>> then reporting any regressions. > >> >>>>>>>>>>> > >> >>>>>>>>>>> If you're working in PySpark you can set up a virtual env and > >> >>>>>>>>>>> install the current RC and see if anything important breaks, > >> >>>>>>>>>>> in the > >> >>>>>>>>>>> Java/Scala you can add the staging repository to your projects > >> >>>>>>>>>>> resolvers and > >> >>>>>>>>>>> test with the RC (make sure to clean up the artifact cache > >> >>>>>>>>>>> before/after so > >> >>>>>>>>>>> you don't end up building with a out of date RC going > >> >>>>>>>>>>> forward). > >> >>>>>>>>>>> > >> >>>>>>>>>>> =========================================== > >> >>>>>>>>>>> What should happen to JIRA tickets still targeting 2.3.0? > >> >>>>>>>>>>> =========================================== > >> >>>>>>>>>>> > >> >>>>>>>>>>> Committers should look at those and triage. Extremely > >> >>>>>>>>>>> important > >> >>>>>>>>>>> bug fixes, documentation, and API tweaks that impact > >> >>>>>>>>>>> compatibility should be > >> >>>>>>>>>>> worked on immediately. Everything else please retarget to > >> >>>>>>>>>>> 2.3.1 or 2.4.0 as > >> >>>>>>>>>>> appropriate. > >> >>>>>>>>>>> > >> >>>>>>>>>>> =================== > >> >>>>>>>>>>> Why is my bug not fixed? > >> >>>>>>>>>>> =================== > >> >>>>>>>>>>> > >> >>>>>>>>>>> In order to make timely releases, we will typically not hold > >> >>>>>>>>>>> the > >> >>>>>>>>>>> release unless the bug in question is a regression from 2.2.0. > >> >>>>>>>>>>> That being > >> >>>>>>>>>>> said, if there is something which is a regression from 2.2.0 > >> >>>>>>>>>>> and has not > >> >>>>>>>>>>> been correctly targeted please ping me or a committer to help > >> >>>>>>>>>>> target the > >> >>>>>>>>>>> issue (you can see the open issues listed as impacting Spark > >> >>>>>>>>>>> 2.3.0 at > >> >>>>>>>>>>> https://s.apache.org/WmoI <https://s.apache.org/WmoI>). > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> -- > >> >>>>>>>>>> Sameer Agarwal > >> >>>>>>>>>> Computer Science | UC Berkeley > >> >>>>>>>>>> http://cs.berkeley.edu/~sameerag > >> >>>>>>>>>> <http://cs.berkeley.edu/~sameerag> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> -- > >> >>>>>>>>> Sameer Agarwal > >> >>>>>>>>> Computer Science | UC Berkeley > >> >>>>>>>>> http://cs.berkeley.edu/~sameerag > >> >>>>>>>>> <http://cs.berkeley.edu/~sameerag> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> Takuya UESHIN > >> >>>>>> Tokyo, Japan > >> >>>>>> > >> >>>>>> http://twitter.com/ueshin <http://twitter.com/ueshin> > >> >>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> >> > >> >> > >> >> -- > >> >> Ryan Blue > >> >> Software Engineer > >> >> Netflix > >> > > >> > > >> > >> > >> > >> -- > >> Marcelo > > > > > > > > -- > Marcelo > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> > > > > > > -- > Ryan Blue > Software Engineer > Netflix >