+1 Ran local tests and tested our spark apps on a spark+yarn cluster.
Cheers, Sean > On Mar 8, 2015, at 11:51 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > > +1 (non-binding, doc and packaging issues aside) > > Built from source, ran jobs and spark-shell against a pseudo-distributed > YARN cluster. > > On Sun, Mar 8, 2015 at 2:42 PM, Krishna Sankar <ksanka...@gmail.com> wrote: > >> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop >> Distributions X ... >> >> May be one option is to have a minimum basic set (which I know is what we >> are discussing) and move the rest to spark-packages.org. There the vendors >> can add the latest downloads - for example when 1.4 is released, HDP can >> build a release of HDP Spark 1.4 bundle. >> >> Cheers >> <k/> >> >> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell <pwend...@gmail.com> >> wrote: >> >>> We probably want to revisit the way we do binaries in general for >>> 1.4+. IMO, something worth forking a separate thread for. >>> >>> I've been hesitating to add new binaries because people >>> (understandably) complain if you ever stop packaging older ones, but >>> on the other hand the ASF has complained that we have too many >>> binaries already and that we need to pare it down because of the large >>> volume of files. Doubling the number of binaries we produce for Scala >>> 2.11 seemed like it would be too much. >>> >>> One solution potentially is to actually package "Hadoop provided" >>> binaries and encourage users to use these by simply setting >>> HADOOP_HOME, or have instructions for specific distros. I've heard >>> that our existing packages don't work well on HDP for instance, since >>> there are some configuration quirks that differ from the upstream >>> Hadoop. >>> >>> If we cut down on the cross building for Hadoop versions, then it is >>> more tenable to cross build for Scala versions without exploding the >>> number of binaries. >>> >>> - Patrick >>> >>> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen <so...@cloudera.com> wrote: >>>> Yeah, interesting question of what is the better default for the >>>> single set of artifacts published to Maven. I think there's an >>>> argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros >>>> and cons discussed more at >>>> >>>> https://issues.apache.org/jira/browse/SPARK-5134 >>>> https://github.com/apache/spark/pull/3917 >>>> >>>> On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia <matei.zaha...@gmail.com >>> >>> wrote: >>>>> +1 >>>>> >>>>> Tested it on Mac OS X. >>>>> >>>>> One small issue I noticed is that the Scala 2.11 build is using Hadoop >>> 1 without Hive, which is kind of weird because people will more likely >> want >>> Hadoop 2 with Hive. So it would be good to publish a build for that >>> configuration instead. We can do it if we do a new RC, or it might be >> that >>> binary builds may not need to be voted on (I forgot the details there). >>>>> >>>>> Matei >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> For additional commands, e-mail: dev-h...@spark.apache.org >>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org