In general, I’ve seen Spark 2.0.0 and 2.1.0 are faster than 1.6.0 because of the “whole-stage code generation” – as per release notes, (2 – 10X) performance speedups for common operators in SQL and DataFrames, including joins. The only thing that concerns me is MLlib deprecation in 2.1.0.
Given that, I’d say, we should migrate to 2.0.x, start experimenting with Spark ML – LDA and give support for 1.6.0, like Nate says, for one year or so. On 4/21/17, 6:59 PM, "Austin Leahy" <[email protected]> wrote: Damn Michael beat me to it ;D On Fri, Apr 21, 2017 at 4:58 PM Michael Ridley <[email protected]> wrote: > Given that the project has not had a release, I don't see any reason to > stick with 1.6 support. Now seems like a good time to switch to 2 if that's > what people want to do. I haven't had time to do a deep dive on Spark 2 yet > so I don't have enough information to have a technical opinion, other than > that I hear a lot of excitement and preference for Spark 2. > > Michael Ridley > Senior Solutions Architect > Cloudera > > Sent from my mobile. > Pardon any spelling errors. > > > On Apr 21, 2017, at 6:39 PM, Segerlind, Nathan L < > [email protected]> wrote: > > > > Hi everybody. > > > > There's been some talk about upgrading to Spark 2.1. > > > > Do people think this is worthwhile? > > > > Would others like to see continued support for 1.6? For how long and it > what capacity? > > > > Should we maintain two branches? > > > > Or perhaps drive the 2.1 branch forward and only send bug fixes to the > 1.6 branch for another year or so? > > > > >
