Re: Straw poll: dropping support for things like Scala 2.10

Steve Loughran Fri, 28 Oct 2016 06:45:06 -0700

Twitter just led the release of Hadoop 2.6.5 precisely because they wanted to 
keep a Java 6 cluster up: the bigger your cluster, the less of a rush to 
upgrade.


HDP? I believe we install & prefer (openjdk) Java 8, but the Hadoop branch-2 
line is intended to build/run on Java 7 too. There's always a conflict between 
us developers "shiny new features" and ops "keep cluster alive". That's 
actually where Scala has an edge: no need to upgrade the cluster-wide JVM just 
for an update, or play games configuring your deployed application to use a 
different JVM from the Hadoop services (which you can do, after all: it's just 
path setup). Thinking about it, knowing what can be done there —including 
documenting it in the spark docs, could be a good migration strategy.

me? I look forward to when we can use Java 9 to isolate transitive 
dependencies; the bane of everyone's life. Someone needs to start on preparing 
everything for that to work though.

On 28 Oct 2016, at 11:47, Chris Fregly 
<ch...@fregly.com<mailto:ch...@fregly.com>> wrote:

i seem to remember a large spark user (tencent, i believe) chiming in late 
during these discussions 6-12 months ago and squashing any sort of deprecation 
given the massive effort that would be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users 
- and reflect the real world versus ideal world.

otherwise, this is all for naught like last time.

On Oct 28, 2016, at 10:43 AM, Sean Owen 
<so...@cloudera.com<mailto:so...@cloudera.com>> wrote:

If the subtext is vendors, then I'd have a look at what recent distros look 
like. I'll write about CDH as a representative example, but I think other 
distros are naturally similar.

CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / 
Dec 2014). Granted, this depends on installing on an OS with that Java / Python 
version. But Java 8 / Python 2.7 is available for all of the supported OSes. 
The population that isn't on CDH 4, because that supported was dropped a long 
time ago in Spark, and who is on a version released 2-2.5 years ago, and won't 
update, is a couple percent of the installed base. They do not in general want 
anything to change at all.

I assure everyone that vendors too are aligned in wanting to cater to the crowd 
that wants the most recent version of everything. For example, CDH offers both 
Spark 2.0.1 and 1.6 at the same time.

I wouldn't dismiss support for these supporting components as a relevant proxy 
for whether they are worth supporting in Spark. Java 7 is long since EOL (no, I 
don't count paying Oracle for support). No vendor is supporting Hadoop < 2.6. 
Scala 2.10 was EOL at the end of 2014. Is there a criteria here that reaches a 
different conclusion about these things just for Spark? This was roughly the 
same conversation that happened 6 months ago.

I imagine we're going to find that in about 6 months it'll make more sense all 
around to remove these. If we can just give a heads up with deprecation and 
then kick the can down the road a bit more, that sounds like enough for now.

On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia 
<matei.zaha...@gmail.com<mailto:matei.zaha...@gmail.com>> wrote:
Deprecating them is fine (and I know they're already deprecated), the question 
is just whether to remove them. For example, what exactly is the downside of 
having Python 2.6 or Java 7 right now? If it's high, then we can remove them, 
but I just haven't seen a ton of details. It also sounded like fairly recent 
versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop 
cluster from $VENDOR, but we just download Spark from Apache and run newer 
versions of that". That's great for Spark IMO, and we need to stay compatible 
even with somewhat older Hadoop installs because they are time-consuming to 
update. Having the whole community on a small set of versions leads to a better 
experience for everyone and also to more of a "network effect": more people can 
battle-test new versions, answer questions about them online, write libraries 
that easily reach the majority of Spark users, etc.

Re: Straw poll: dropping support for things like Scala 2.10

Reply via email to