Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-06 Thread Sameer Agarwal
FYI -- Thanks to a big community-wide effort over the last few days, we're now down to just one last remaining code blocker again: https://issues.apache.org/jira/browse/SPARK-23309 I'll cut an RC3 as soon as that's resolved. On 4 February 2018 at 00:02, Xingbo Jiang

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-04 Thread Xingbo Jiang
I filed another NPE problem in WebUI, I believe this is regression in 2.3: https://issues.apache.org/jira/browse/SPARK-23330 2018-02-01 10:38 GMT-08:00 Tom Graves : > I filed a jira [SPARK-23304] Spark SQL coalesce() against hive not > working - ASF JIRA

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Sameer Agarwal
[+ Xiao] SPARK-23290 does sound like a blocker. On the SQL side, I can confirm that there were non-trivial changes around repartitioning/coalesce and cache performance in 2.3 -- we're currently investigating these. On 1 February 2018 at 10:02, Andrew Ash wrote: > I'd

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Andrew Ash
I'd like to nominate SPARK-23290 as a potential blocker for the 2.3.0 release. It's a regression from 2.2.0 in that user pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type return type of date columns changed from object to

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Tom Graves
Testing with spark 2.3 and I see a difference in the sql coalesce talking to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce. Query:spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= '20170301' AND dt <= '20170331' AND something IS NOT

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Michael Heuer
We found two classes new to Spark 2.3.0 that must be registered in Kryo for our tests to pass on RC2 org.apache.spark.sql.execution.datasources.BasicWriteTaskStats org.apache.spark.sql.execution.datasources.ExecutedWriteSummary https://github.com/bigdatagenomics/adam/pull/1897 Perhaps a mention

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Nick Pentreath
All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side that should be everything outstanding. On Thu, 1 Feb 2018 at 06:21 Yin Huai wrote: > seems we are not running tests related to pandas in pyspark tests (see my > email "python tests related to pandas

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-31 Thread Yin Huai
seems we are not running tests related to pandas in pyspark tests (see my email "python tests related to pandas are skipped in jenkins"). I think we should fix this test issue and make sure all tests are good before cutting RC3. On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-31 Thread Sameer Agarwal
Just a quick status update on RC3 -- SPARK-23274 was resolved yesterday and tests have been quite healthy throughout this week and the last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-30 Thread Andrew Ash
I'd like to nominate SPARK-23274 as a potential blocker for the 2.3.0 release as well, due to being a regression from 2.2.0. The ticket has a simple repro included, showing a query that works in prior releases but now fails with an exception in

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-26 Thread Sameer Agarwal
This vote has failed due to a number of aforementioned blockers. I'll follow up with RC3 as soon as the 2 remaining (non-QA) blockers are resolved: https://s.apache.org/oXKi On 25 January 2018 at 12:59, Sameer Agarwal wrote: > > Most tests pass on RC2, except I'm still

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Joseph Torres
SPARK-23221 fixes an issue specific to KafkaContinuousSourceStressForDontFailOnDataLossSuite; I don't think it could cause other suites to deadlock. Do note that the previous hang issues we saw caused by SPARK-23055 were correctly marked as failures. On Thu, Jan 25, 2018 at 3:40 PM,

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Shixiong(Ryan) Zhu
+ Jose On Thu, Jan 25, 2018 at 2:18 PM, Dongjoon Hyun wrote: > SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue. > > For the hang issues, it seems not to be marked as a failure correctly in > Apache Spark Jenkins history. > > > On Thu, Jan 25, 2018

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Dongjoon Hyun
SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue. For the hang issues, it seems not to be marked as a failure correctly in Apache Spark Jenkins history. On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin wrote: > On Thu, Jan 25, 2018 at 12:29 PM, Sean

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen wrote: > I am still seeing these tests fail or hang: > > - subscribing topic by name from earliest offsets (failOnDataLoss: false) > - subscribing topic by name from earliest offsets (failOnDataLoss: true) This is something that we

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
> Most tests pass on RC2, except I'm still seeing the timeout caused by > https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never > finish. I followed the thread a bit further and wasn't clear whether it was > subsequently re-fixed for 2.3.0 or not. It says it's resolved along with >

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
I'm a -1 too. In addition to SPARK-23207 , we've recently merged two codegen fixes (SPARK-23208 and SPARK-21717 ) that address a major

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sean Owen
Most tests pass on RC2, except I'm still seeing the timeout caused by https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never finish. I followed the thread a bit further and wasn't clear whether it was subsequently re-fixed for 2.3.0 or not. It says it's resolved along with

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Nick Pentreath
I think this has come up before (and Sean mentions it above), but the sub-items on: SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella are actually marked as Blockers, but are not targeted to 2.3.0. I think they should be, and I'm not comfortable with those not being resolved before voting

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
Sorry, have to change my vote again. Hive guys ran into SPARK-23209 and that's a regression we need to fix. I'll post a patch soon. So -1 (although others have already -1'ed). On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin wrote: > Given that the bugs I was worried about

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread 蒋星博
I'm sorry to post -1 on this, since there is a non-trivial correctness issue that I believe we should fix in 2.3. TL;DR; of the issue: A certain pattern of shuffle+repartition in a query may produce wrong result if some downstream stages failed and trigger retry of repartition, the reason of this

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-24 Thread Marcelo Vanzin
Given that the bugs I was worried about have been dealt with, I'm upgrading to +1. On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin wrote: > +0 > > Signatures check out. Code compiles, although I see the errors in [1] > when untarring the source archive; perhaps we should add

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Xiao Li
+1 Xiao Li 2018-01-23 9:44 GMT-08:00 Marcelo Vanzin : > On Tue, Jan 23, 2018 at 7:01 AM, Sean Owen wrote: > > I'm not seeing that same problem on OS X and /usr/bin/tar. I tried > unpacking > > it with 'xvzf' and also unzipping it first, and it untarred

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Marcelo Vanzin
On Tue, Jan 23, 2018 at 7:01 AM, Sean Owen wrote: > I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking > it with 'xvzf' and also unzipping it first, and it untarred without warnings > in either case. The warnings just show up if you unpack using GNU

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Sean Owen
I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking it with 'xvzf' and also unzipping it first, and it untarred without warnings in either case. I am encountering errors while running the tests, different ones each time, so am still figuring out whether there is a real

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Wenchen Fan
+1 All the blocking issues are resolved(AFAIK), and important data source v2 features have been merged. On Tue, Jan 23, 2018 at 9:09 AM, Marcelo Vanzin wrote: > +0 > > Signatures check out. Code compiles, although I see the errors in [1] > when untarring the source

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Marcelo Vanzin
+0 Signatures check out. Code compiles, although I see the errors in [1] when untarring the source archive; perhaps we should add "use GNU tar" to the RM checklist? Also ran our internal tests and they seem happy. My concern is the list of open bugs targeted at 2.3.0 (ignoring the documentation

[VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 votes are cast. [ ] +1 Release this package as Apache Spark 2.3.0 [ ] -1 Do not release this package