+1 to support Java 8 (and future) *only* in Spark 2.0, and end support of Java 7. It makes sense.

Regards
JB

On 03/24/2016 08:27 AM, Reynold Xin wrote:
About a year ago we decided to drop Java 6 support in Spark 1.5. I am
wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
Spark 2.0 would require Java 8 to run).

Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
removed public downloads for JDK 7 in July 2015. In the past I've
actually been against dropping Java 8, but today I ran into an issue
with the new Dataset API not working well with Java 8 lambdas, and that
changed my opinion on this.

I've been thinking more about this issue today and also talked with a
lot people offline to gather feedback, and I actually think the pros
outweighs the cons, for the following reasons (in some rough order of
importance):

1. It is complicated to test how well Spark APIs work for Java lambdas
if we support Java 7. Jenkins machines need to have both Java 7 and Java
8 installed and we must run through a set of test suites in 7, and then
the lambda tests in Java 8. This complicates build environments/scripts,
and makes them less robust. Without good testing infrastructure, I have
no confidence in building good APIs for Java 8.

2. Dataset/DataFrame performance will be between 1x to 10x slower in
Java 7. The primary APIs we want users to use in Spark 2.x are
Dataset/DataFrame, and this impacts pretty much everything from machine
learning to structured streaming. We have made great progress in their
performance through extensive use of code generation. (In many
dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler
than a MapReduce or query engine.) These optimizations don't work well
in Java 7 due to broken code cache flushing. This problem has been fixed
by Oracle in Java 8. In addition, Java 8 comes with better support for
Unsafe and SIMD.

3. Scala 2.12 will come out soon, and we will want to add support for
that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
have a fairly complicated compatibility matrix and testing infrastructure.

4. There are libraries that I've looked into in the past that support
only Java 8. This is more common in high performance libraries such as
Aeron (a messaging library). Having to support Java 7 means we are not
able to use these. It is not that big of a deal right now, but will
become increasingly more difficult as we optimize performance.


The downside of not supporting Java 7 is also obvious. Some
organizations are stuck with Java 7, and they wouldn't be able to use
Spark 2.0 without upgrading Java.



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to