FYI. I found two more blockers: https://issues.apache.org/jira/browse/SPARK-23475 https://issues.apache.org/jira/browse/SPARK-23481
On Wed, Feb 21, 2018 at 9:45 AM, Xiao Li <gatorsm...@gmail.com> wrote: > Hi, Ryan, > > In this release, Data Source V2 is experimental. We are still collecting > the feedbacks from the community and will improve the related APIs and > implementation in the next 2.4 release. > > Thanks, > > Xiao > > 2018-02-21 9:43 GMT-08:00 Xiao Li <gatorsm...@gmail.com>: > >> Hi, Justin, >> >> Based on my understanding, SPARK-17147 is also not a regression. Thus, >> Spark 2.3.0 is unable to contain it. We have to wait for the committers who >> are familiar with Spark Streaming to make a decision whether we can fix the >> issue in Spark 2.3.1. >> >> Since this is open source, feel free to add the patch in your local build. >> >> Thanks for using Spark! >> >> Xiao >> >> >> 2018-02-21 9:36 GMT-08:00 Ryan Blue <rb...@netflix.com.invalid>: >> >>> No problem if we can't add them, this is experimental anyway so this >>> release should be more about validating the API and the start of our >>> implementation. I just don't think we can recommend that anyone actually >>> use DataSourceV2 without these patches. >>> >>> On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan <cloud0...@gmail.com> >>> wrote: >>> >>>> SPARK-23323 adds a new API, I'm not sure we can still do it at this >>>> stage of the release... Besides users can work around it by calling the >>>> spark output coordinator themselves in their data source. >>>> >>>> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard >>>> to convince other people that it's safe to add it to the release during the >>>> RC phase. >>>> >>>> SPARK-23418 depends on the above one. >>>> >>>> Generally they are good to have in Spark 2.3, if they were merged >>>> before the RC. I think this is a lesson we should learn from, that we >>>> should work on stuff we want in the release before the RC, instead of >>>> after. >>>> >>>> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue <rb...@netflix.com.invalid> >>>> wrote: >>>> >>>>> What does everyone think about getting some of the newer DataSourceV2 >>>>> improvements in? It should be low risk because it is a new code path, and >>>>> v2 isn't very usable without things like support for using the output >>>>> commit coordinator to deconflict writes. >>>>> >>>>> The ones I'd like to get in are: >>>>> * Use the output commit coordinator: https://issues.ap >>>>> ache.org/jira/browse/SPARK-23323 >>>>> * Use immutable trees and the same push-down logic as other read >>>>> paths: https://issues.apache.org/jira/browse/SPARK-23203 >>>>> * Don't allow users to supply schemas when they aren't supported: >>>>> https://issues.apache.org/jira/browse/SPARK-23418 >>>>> >>>>> I think it would make the 2.3.0 release more usable for anyone >>>>> interested in the v2 read and write paths. >>>>> >>>>> Thanks! >>>>> >>>>> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu <weichen...@databricks.com >>>>> > wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin <van...@cloudera.com >>>>>> > wrote: >>>>>> >>>>>>> Done, thanks! >>>>>>> >>>>>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <samee...@apache.org> >>>>>>> wrote: >>>>>>> > Sure, please feel free to backport. >>>>>>> > >>>>>>> > On 20 February 2018 at 18:02, Marcelo Vanzin <van...@cloudera.com> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> Hey Sameer, >>>>>>> >> >>>>>>> >> Mind including https://github.com/apache/spark/pull/20643 >>>>>>> >> (SPARK-23468) in the new RC? It's a minor bug since I've only >>>>>>> hit it >>>>>>> >> with older shuffle services, but it's pretty safe. >>>>>>> >> >>>>>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal < >>>>>>> samee...@apache.org> >>>>>>> >> wrote: >>>>>>> >> > This RC has failed due to >>>>>>> >> > https://issues.apache.org/jira/browse/SPARK-23470. >>>>>>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll >>>>>>> follow >>>>>>> >> > up >>>>>>> >> > with an RC5 soon. >>>>>>> >> > >>>>>>> >> > On 20 February 2018 at 16:49, Ryan Blue <rb...@netflix.com> >>>>>>> wrote: >>>>>>> >> >> >>>>>>> >> >> +1 >>>>>>> >> >> >>>>>>> >> >> Build & tests look fine, checked signature and checksums for >>>>>>> src >>>>>>> >> >> tarball. >>>>>>> >> >> >>>>>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu >>>>>>> >> >> <shixi...@databricks.com> wrote: >>>>>>> >> >>> >>>>>>> >> >>> I'm -1 because of the UI regression >>>>>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All >>>>>>> Jobs page >>>>>>> >> >>> may be >>>>>>> >> >>> too slow and cause "read timeout" when there are lots of jobs >>>>>>> and >>>>>>> >> >>> stages. >>>>>>> >> >>> This is one of the most important pages because when it's >>>>>>> broken, it's >>>>>>> >> >>> pretty hard to use Spark Web UI. >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido < >>>>>>> marcogaid...@gmail.com> >>>>>>> >> >>> wrote: >>>>>>> >> >>>> >>>>>>> >> >>>> +1 >>>>>>> >> >>>> >>>>>>> >> >>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon <gurwls...@gmail.com >>>>>>> >: >>>>>>> >> >>>>> >>>>>>> >> >>>>> +1 too >>>>>>> >> >>>>> >>>>>>> >> >>>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN < >>>>>>> ues...@happy-camper.st>: >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> +1 >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang >>>>>>> >> >>>>>> <jiangxb1...@gmail.com> >>>>>>> >> >>>>>> wrote: >>>>>>> >> >>>>>>> >>>>>>> >> >>>>>>> +1 >>>>>>> >> >>>>>>> >>>>>>> >> >>>>>>> >>>>>>> >> >>>>>>> Wenchen Fan <cloud0...@gmail.com>于2018年2月20日 周二下午1:09写道: >>>>>>> >> >>>>>>>> >>>>>>> >> >>>>>>>> +1 >>>>>>> >> >>>>>>>> >>>>>>> >> >>>>>>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin >>>>>>> >> >>>>>>>> <r...@databricks.com> >>>>>>> >> >>>>>>>> wrote: >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> +1 >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal >>>>>>> >> >>>>>>>>> <sameer.a...@gmail.com>, wrote: >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> this file shouldn't be included? >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> https://dist.apache.org/repos/ >>>>>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> I've now deleted this file >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>>> From: Sameer Agarwal <sameer.a...@gmail.com> >>>>>>> >> >>>>>>>>>> Sent: Saturday, February 17, 2018 1:43:39 PM >>>>>>> >> >>>>>>>>>> To: Sameer Agarwal >>>>>>> >> >>>>>>>>>> Cc: dev >>>>>>> >> >>>>>>>>>> Subject: Re: [VOTE] Spark 2.3.0 (RC4) >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> I'll start with a +1 once again. >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> All blockers reported against RC3 have been resolved >>>>>>> and the >>>>>>> >> >>>>>>>>>> builds are healthy. >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> On 17 February 2018 at 13:41, Sameer Agarwal >>>>>>> >> >>>>>>>>>> <samee...@apache.org> >>>>>>> >> >>>>>>>>>> wrote: >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> Please vote on releasing the following candidate as >>>>>>> Apache >>>>>>> >> >>>>>>>>>>> Spark >>>>>>> >> >>>>>>>>>>> version 2.3.0. The vote is open until Thursday >>>>>>> February 22, >>>>>>> >> >>>>>>>>>>> 2018 at 8:00:00 >>>>>>> >> >>>>>>>>>>> am UTC and passes if a majority of at least 3 PMC +1 >>>>>>> votes are >>>>>>> >> >>>>>>>>>>> cast. >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.3.0 >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> To learn more about Apache Spark, please see >>>>>>> >> >>>>>>>>>>> https://spark.apache.org/ >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> The tag to be voted on is v2.3.0-rc4: >>>>>>> >> >>>>>>>>>>> https://github.com/apache/spark/tree/v2.3.0-rc4 >>>>>>> >> >>>>>>>>>>> (44095cb65500739695b0324c177c19dfa1471472) >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> List of JIRA tickets resolved in this release can be >>>>>>> found >>>>>>> >> >>>>>>>>>>> here: >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> https://issues.apache.org/jira >>>>>>> /projects/SPARK/versions/12339551 >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> The release files, including signatures, digests, >>>>>>> etc. can be >>>>>>> >> >>>>>>>>>>> found at: >>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/ >>>>>>> dist/dev/spark/v2.3.0-rc4-bin/ >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> Release artifacts are signed with the following key: >>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> The staging repository for this release can be found >>>>>>> at: >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> https://repository.apache.org/ >>>>>>> content/repositories/orgapachespark-1265/ >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> The documentation corresponding to this release can >>>>>>> be found >>>>>>> >> >>>>>>>>>>> at: >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/ >>>>>>> dist/dev/spark/v2.3.0-rc4-docs/_site/index.html >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> FAQ >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> ======================================= >>>>>>> >> >>>>>>>>>>> What are the unresolved issues targeted for 2.3.0? >>>>>>> >> >>>>>>>>>>> ======================================= >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> Please see https://s.apache.org/oXKi. At the time of >>>>>>> writing, >>>>>>> >> >>>>>>>>>>> there are currently no known release blockers. >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> ========================= >>>>>>> >> >>>>>>>>>>> How can I help test this release? >>>>>>> >> >>>>>>>>>>> ========================= >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> If you are a Spark user, you can help us test this >>>>>>> release by >>>>>>> >> >>>>>>>>>>> taking an existing Spark workload and running on this >>>>>>> release >>>>>>> >> >>>>>>>>>>> candidate, >>>>>>> >> >>>>>>>>>>> then reporting any regressions. >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> If you're working in PySpark you can set up a virtual >>>>>>> env and >>>>>>> >> >>>>>>>>>>> install the current RC and see if anything important >>>>>>> breaks, >>>>>>> >> >>>>>>>>>>> in the >>>>>>> >> >>>>>>>>>>> Java/Scala you can add the staging repository to your >>>>>>> projects >>>>>>> >> >>>>>>>>>>> resolvers and >>>>>>> >> >>>>>>>>>>> test with the RC (make sure to clean up the artifact >>>>>>> cache >>>>>>> >> >>>>>>>>>>> before/after so >>>>>>> >> >>>>>>>>>>> you don't end up building with a out of date RC going >>>>>>> >> >>>>>>>>>>> forward). >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> =========================================== >>>>>>> >> >>>>>>>>>>> What should happen to JIRA tickets still targeting >>>>>>> 2.3.0? >>>>>>> >> >>>>>>>>>>> =========================================== >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> Committers should look at those and triage. Extremely >>>>>>> >> >>>>>>>>>>> important >>>>>>> >> >>>>>>>>>>> bug fixes, documentation, and API tweaks that impact >>>>>>> >> >>>>>>>>>>> compatibility should be >>>>>>> >> >>>>>>>>>>> worked on immediately. Everything else please >>>>>>> retarget to >>>>>>> >> >>>>>>>>>>> 2.3.1 or 2.4.0 as >>>>>>> >> >>>>>>>>>>> appropriate. >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> =================== >>>>>>> >> >>>>>>>>>>> Why is my bug not fixed? >>>>>>> >> >>>>>>>>>>> =================== >>>>>>> >> >>>>>>>>>>> >>>>>>> >> >>>>>>>>>>> In order to make timely releases, we will typically >>>>>>> not hold >>>>>>> >> >>>>>>>>>>> the >>>>>>> >> >>>>>>>>>>> release unless the bug in question is a regression >>>>>>> from 2.2.0. >>>>>>> >> >>>>>>>>>>> That being >>>>>>> >> >>>>>>>>>>> said, if there is something which is a regression >>>>>>> from 2.2.0 >>>>>>> >> >>>>>>>>>>> and has not >>>>>>> >> >>>>>>>>>>> been correctly targeted please ping me or a committer >>>>>>> to help >>>>>>> >> >>>>>>>>>>> target the >>>>>>> >> >>>>>>>>>>> issue (you can see the open issues listed as >>>>>>> impacting Spark >>>>>>> >> >>>>>>>>>>> 2.3.0 at >>>>>>> >> >>>>>>>>>>> https://s.apache.org/WmoI). >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> >>>>>>> >> >>>>>>>>>> -- >>>>>>> >> >>>>>>>>>> Sameer Agarwal >>>>>>> >> >>>>>>>>>> Computer Science | UC Berkeley >>>>>>> >> >>>>>>>>>> http://cs.berkeley.edu/~sameerag >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> >>>>>>> >> >>>>>>>>> -- >>>>>>> >> >>>>>>>>> Sameer Agarwal >>>>>>> >> >>>>>>>>> Computer Science | UC Berkeley >>>>>>> >> >>>>>>>>> http://cs.berkeley.edu/~sameerag >>>>>>> >> >>>>>>>> >>>>>>> >> >>>>>>>> >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> -- >>>>>>> >> >>>>>> Takuya UESHIN >>>>>>> >> >>>>>> Tokyo, Japan >>>>>>> >> >>>>>> >>>>>>> >> >>>>>> http://twitter.com/ueshin >>>>>>> >> >>>>> >>>>>>> >> >>>>> >>>>>>> >> >>>> >>>>>>> >> >>> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> -- >>>>>>> >> >> Ryan Blue >>>>>>> >> >> Software Engineer >>>>>>> >> >> Netflix >>>>>>> >> > >>>>>>> >> > >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -- >>>>>>> >> Marcelo >>>>>>> > >>>>>>> > >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Marcelo >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> --------- >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Software Engineer >>>>> Netflix >>>>> >>>> >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >