Container Java version can be different from yarn Java version : we run jobs with jdk8 on jdk7 cluster without issues.
Regards Mridul On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote: > i guess what i am saying is that in a yarn world the only hard > restrictions left are the the containers you run in, which means the hadoop > version, java version and python version (if you use python). > > > On Thu, Mar 24, 2016 at 12:39 PM, Koert Kuipers <ko...@tresata.com > <javascript:_e(%7B%7D,'cvml','ko...@tresata.com');>> wrote: > >> The group will not upgrade to spark 2.0 themselves, but they are mostly >> fine with vendors like us deploying our application via yarn with whatever >> spark version we choose (and bundle, so they do not install it separately, >> they might not even be aware of what spark version we use). This all works >> because spark does not need to be on the cluster nodes, just on the one >> machine where our application gets launched. Having yarn is pretty awesome >> in this respect. >> >> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com >> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote: >> >>> (PS CDH5 runs fine with Java 8, but I understand your more general >>> point.) >>> >>> This is a familiar context indeed, but in that context, would a group >>> not wanting to update to Java 8 want to manually put Spark 2.0 into >>> the mix? That is, if this is a context where the cluster is >>> purposefully some stable mix of components, would you be updating just >>> one? >>> >>> You make a good point about Scala being more library than >>> infrastructure component. So it can be updated on a per-app basis. On >>> the one hand it's harder to handle different Scala versions from the >>> framework side, it's less hard on the deployment side. >>> >>> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com >>> <javascript:_e(%7B%7D,'cvml','ko...@tresata.com');>> wrote: >>> > i think the arguments are convincing, but it also makes me wonder if i >>> live >>> > in some kind of alternate universe... we deploy on customers clusters, >>> where >>> > the OS, python version, java version and hadoop distro are not chosen >>> by us. >>> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply >>> have >>> > access to a single proxy machine and launch through yarn. asking them >>> to >>> > upgrade java is pretty much out of the question or a 6+ month ordeal. >>> of the >>> > 10 client clusters i can think of on the top of my head all of them >>> are on >>> > java 7, none are on java 8. so by doing this you would make spark 2 >>> > basically unusable for us (unless most of them have plans of upgrading >>> in >>> > near term to java 8, i will ask around and report back...). >>> > >>> > on a side note, its particularly interesting to me that spark 2 chose >>> to >>> > continue support for scala 2.10, because even for us in our very >>> constricted >>> > client environments the scala version is something we can easily >>> upgrade (we >>> > just deploy a custom build of spark for the relevant scala version and >>> > hadoop distro). and because scala is not a dependency of any hadoop >>> distro >>> > (so not on classpath, which i am very happy about) we can use whatever >>> scala >>> > version we like. also i found the upgrade path from scala 2.10 to 2.11 >>> to be >>> > very easy, so i have a hard time understanding why anyone would stay on >>> > scala 2.10. and finally with scala 2.12 around the corner you really >>> dont >>> > want to be supporting 3 versions. so clearly i am missing something >>> here. >>> > >>> > >>> > >>> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <j...@nanthrax.net >>> <javascript:_e(%7B%7D,'cvml','j...@nanthrax.net');>> >>> > wrote: >>> >> >>> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end >>> support of >>> >> Java 7. It makes sense. >>> >> >>> >> Regards >>> >> JB >>> >> >>> >> >>> >> On 03/24/2016 08:27 AM, Reynold Xin wrote: >>> >>> >>> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am >>> >>> wondering if we should also just drop Java 7 support in Spark 2.0 >>> (i.e. >>> >>> Spark 2.0 would require Java 8 to run). >>> >>> >>> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and >>> >>> removed public downloads for JDK 7 in July 2015. In the past I've >>> >>> actually been against dropping Java 8, but today I ran into an issue >>> >>> with the new Dataset API not working well with Java 8 lambdas, and >>> that >>> >>> changed my opinion on this. >>> >>> >>> >>> I've been thinking more about this issue today and also talked with a >>> >>> lot people offline to gather feedback, and I actually think the pros >>> >>> outweighs the cons, for the following reasons (in some rough order of >>> >>> importance): >>> >>> >>> >>> 1. It is complicated to test how well Spark APIs work for Java >>> lambdas >>> >>> if we support Java 7. Jenkins machines need to have both Java 7 and >>> Java >>> >>> 8 installed and we must run through a set of test suites in 7, and >>> then >>> >>> the lambda tests in Java 8. This complicates build >>> environments/scripts, >>> >>> and makes them less robust. Without good testing infrastructure, I >>> have >>> >>> no confidence in building good APIs for Java 8. >>> >>> >>> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in >>> >>> Java 7. The primary APIs we want users to use in Spark 2.x are >>> >>> Dataset/DataFrame, and this impacts pretty much everything from >>> machine >>> >>> learning to structured streaming. We have made great progress in >>> their >>> >>> performance through extensive use of code generation. (In many >>> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a >>> compiler >>> >>> than a MapReduce or query engine.) These optimizations don't work >>> well >>> >>> in Java 7 due to broken code cache flushing. This problem has been >>> fixed >>> >>> by Oracle in Java 8. In addition, Java 8 comes with better support >>> for >>> >>> Unsafe and SIMD. >>> >>> >>> >>> 3. Scala 2.12 will come out soon, and we will want to add support for >>> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd >>> >>> have a fairly complicated compatibility matrix and testing >>> >>> infrastructure. >>> >>> >>> >>> 4. There are libraries that I've looked into in the past that support >>> >>> only Java 8. This is more common in high performance libraries such >>> as >>> >>> Aeron (a messaging library). Having to support Java 7 means we are >>> not >>> >>> able to use these. It is not that big of a deal right now, but will >>> >>> become increasingly more difficult as we optimize performance. >>> >>> >>> >>> >>> >>> The downside of not supporting Java 7 is also obvious. Some >>> >>> organizations are stuck with Java 7, and they wouldn't be able to use >>> >>> Spark 2.0 without upgrading Java. >>> >>> >>> >>> >>> >> >>> >> -- >>> >> Jean-Baptiste Onofré >>> >> jbono...@apache.org >>> <javascript:_e(%7B%7D,'cvml','jbono...@apache.org');> >>> >> http://blog.nanthrax.net >>> >> Talend - http://www.talend.com >>> >> >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> <javascript:_e(%7B%7D,'cvml','dev-unsubscr...@spark.apache.org');> >>> >> For additional commands, e-mail: dev-h...@spark.apache.org >>> <javascript:_e(%7B%7D,'cvml','dev-h...@spark.apache.org');> >>> >> >>> > >>> >> >> >