Re: [discuss] ending support for Java 7 in Spark 2.0

Jean-Baptiste Onofré Thu, 24 Mar 2016 09:40:53 -0700

Hi Al,

Spark 2.0 doesn't mean Spark 1.x will stop. Clearly, new features willgo on Spark 2.0, but maintenance release can be performed on 1.x branch.


Regards
JB

On 03/24/2016 05:38 PM, Al Pivonka wrote:

As an end user (developer) and Cluster Admin.
I would have to agree with Koet.

To me the real question is timing,  current version is 1.6.1, the
question I have is how many more releases till 2.0 and what is the time
frame?

If you give people six to twelve months to plan and make sure they know
(paste it all over the web site) most can plan ahead.


Just my two pennies





On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com
<mailto:so...@cloudera.com>> wrote:

    (PS CDH5 runs fine with Java 8, but I understand your more general
    point.)

    This is a familiar context indeed, but in that context, would a group
    not wanting to update to Java 8 want to manually put Spark 2.0 into
    the mix? That is, if this is a context where the cluster is
    purposefully some stable mix of components, would you be updating just
    one?

    You make a good point about Scala being more library than
    infrastructure component. So it can be updated on a per-app basis. On
    the one hand it's harder to handle different Scala versions from the
    framework side, it's less hard on the deployment side.

    On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com
    <mailto:ko...@tresata.com>> wrote:
     > i think the arguments are convincing, but it also makes me wonder
    if i live
     > in some kind of alternate universe... we deploy on customers
    clusters, where
     > the OS, python version, java version and hadoop distro are not
    chosen by us.
     > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we
    simply have
     > access to a single proxy machine and launch through yarn. asking
    them to
     > upgrade java is pretty much out of the question or a 6+ month
    ordeal. of the
     > 10 client clusters i can think of on the top of my head all of
    them are on
     > java 7, none are on java 8. so by doing this you would make spark 2
     > basically unusable for us (unless most of them have plans of
    upgrading in
     > near term to java 8, i will ask around and report back...).
     >
     > on a side note, its particularly interesting to me that spark 2
    chose to
     > continue support for scala 2.10, because even for us in our very
    constricted
     > client environments the scala version is something we can easily
    upgrade (we
     > just deploy a custom build of spark for the relevant scala
    version and
     > hadoop distro). and because scala is not a dependency of any
    hadoop distro
     > (so not on classpath, which i am very happy about) we can use
    whatever scala
     > version we like. also i found the upgrade path from scala 2.10 to
    2.11 to be
     > very easy, so i have a hard time understanding why anyone would
    stay on
     > scala 2.10. and finally with scala 2.12 around the corner you
    really dont
     > want to be supporting 3 versions. so clearly i am missing
    something here.
     >
     >
     >
     > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré
    <j...@nanthrax.net <mailto:j...@nanthrax.net>>
     > wrote:
     >>
     >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end
    support of
     >> Java 7. It makes sense.
     >>
     >> Regards
     >> JB
     >>
     >>
     >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
     >>>
     >>> About a year ago we decided to drop Java 6 support in Spark
    1.5. I am
     >>> wondering if we should also just drop Java 7 support in Spark
    2.0 (i.e.
     >>> Spark 2.0 would require Java 8 to run).
     >>>
     >>> Oracle ended public updates for JDK 7 in one year ago (Apr
    2015), and
     >>> removed public downloads for JDK 7 in July 2015. In the past I've
     >>> actually been against dropping Java 8, but today I ran into an
    issue
     >>> with the new Dataset API not working well with Java 8 lambdas,
    and that
     >>> changed my opinion on this.
     >>>
     >>> I've been thinking more about this issue today and also talked
    with a
     >>> lot people offline to gather feedback, and I actually think the
    pros
     >>> outweighs the cons, for the following reasons (in some rough
    order of
     >>> importance):
     >>>
     >>> 1. It is complicated to test how well Spark APIs work for Java
    lambdas
     >>> if we support Java 7. Jenkins machines need to have both Java 7
    and Java
     >>> 8 installed and we must run through a set of test suites in 7,
    and then
     >>> the lambda tests in Java 8. This complicates build
    environments/scripts,
     >>> and makes them less robust. Without good testing
    infrastructure, I have
     >>> no confidence in building good APIs for Java 8.
     >>>
     >>> 2. Dataset/DataFrame performance will be between 1x to 10x
    slower in
     >>> Java 7. The primary APIs we want users to use in Spark 2.x are
     >>> Dataset/DataFrame, and this impacts pretty much everything from
    machine
     >>> learning to structured streaming. We have made great progress
    in their
     >>> performance through extensive use of code generation. (In many
     >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
    compiler
     >>> than a MapReduce or query engine.) These optimizations don't
    work well
     >>> in Java 7 due to broken code cache flushing. This problem has
    been fixed
     >>> by Oracle in Java 8. In addition, Java 8 comes with better
    support for
     >>> Unsafe and SIMD.
     >>>
     >>> 3. Scala 2.12 will come out soon, and we will want to add
    support for
     >>> that. Scala 2.12 only works on Java 8. If we do support Java 7,
    we'd
     >>> have a fairly complicated compatibility matrix and testing
     >>> infrastructure.
     >>>
     >>> 4. There are libraries that I've looked into in the past that
    support
     >>> only Java 8. This is more common in high performance libraries
    such as
     >>> Aeron (a messaging library). Having to support Java 7 means we
    are not
     >>> able to use these. It is not that big of a deal right now, but will
     >>> become increasingly more difficult as we optimize performance.
     >>>
     >>>
     >>> The downside of not supporting Java 7 is also obvious. Some
     >>> organizations are stuck with Java 7, and they wouldn't be able
    to use
     >>> Spark 2.0 without upgrading Java.
     >>>
     >>>
     >>
     >> --
     >> Jean-Baptiste Onofré
     >> jbono...@apache.org <mailto:jbono...@apache.org>
     >> http://blog.nanthrax.net
     >> Talend - http://www.talend.com
     >>
     >>
     >>
    ---------------------------------------------------------------------
     >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>
     >> For additional commands, e-mail: dev-h...@spark.apache.org
    <mailto:dev-h...@spark.apache.org>
     >>
     >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>
    For additional commands, e-mail: dev-h...@spark.apache.org
    <mailto:dev-h...@spark.apache.org>




--
Those who say it can't be done, are usually interrupted by those doing it.


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Reply via email to