Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Xiao Li
-1 We have a correctness bug fix that was merged after 2.3 RC1. It would be nice to have that in Spark 2.3.1 release. https://issues.apache.org/jira/browse/SPARK-24259 Xiao 2018-05-15 14:00 GMT-07:00 Marcelo Vanzin : > Please vote on releasing the following candidate as

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
It's in. That link is only a list of the currently open bugs. On Tue, May 15, 2018 at 2:02 PM, Justin Miller wrote: > Did SPARK-24067 not make it in? I don’t see it in https://s.apache.org/Q3Uo. > > Thanks, > Justin > > On May 15, 2018, at 3:00 PM, Marcelo Vanzin

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Justin Miller
Did SPARK-24067 not make it in? I don’t see it in https://s.apache.org/Q3Uo . Thanks, Justin > On May 15, 2018, at 3:00 PM, Marcelo Vanzin wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > The

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
I'll start with my +1 (binding). I've ran unit tests and a bunch of integration tests on the hadoop-2.7 package. Please note that there are still a few flaky tests. Please check jira before you decide to send a -1 because of a flaky test. Also, apologies for the delay in getting the RC ready.

[VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. The vote is open until Friday, May 18, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.3.1 [ ] -1 Do not release this package because ... To

Re: Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
Thanks, filled https://issues.apache.org/jira/browse/SPARK-24288 Pozdrawiam / Best regards, Tomek On 2018-05-15 18:29, Wenchen Fan wrote: applying predict pushdown is an optimization, and it makes sense to provide configs to turn off certain optimizations. Feel free to create a JIRA. Thanks,

Re: Preventing predicate pushdown

2018-05-15 Thread Wenchen Fan
applying predict pushdown is an optimization, and it makes sense to provide configs to turn off certain optimizations. Feel free to create a JIRA. Thanks, Wenchen On Tue, May 15, 2018 at 8:33 PM, Tomasz Gawęda wrote: > Hi, > > while working with JDBC datasource I saw

Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
Hi, while working with JDBC datasource I saw that many "or" clauses with non-equality operators causes huge performance degradation of SQL query to database (DB2). For example: val df = spark.read.format("jdbc").(other options to parallelize load).load() df.where(s"(date1 > $param1 and

Re: Sort-merge join improvement

2018-05-15 Thread Petar Zecevic
Based on some reviews I put additional effort into fixing the case when wholestage codegen is turned off. Sort-merge join with additional range conditions is now 10x faster (can be more or less, depending on exact use-case) in both cases - with wholestage turned off or on - compared to

Re: Integrating ML/DL frameworks with Spark

2018-05-15 Thread Bryan Cutler
Thanks for starting this discussion, I'd also like to see some improvements in this area and glad to hear that the Pandas UDFs / Arrow functionality might be useful. I'm wondering if from your initial investigations you found anything lacking from the Arrow format or possible improvements that