Forgot Reply To All ;o( ---------- Forwarded message ---------- From: Krishna Sankar <ksanka...@gmail.com> Date: Wed, Dec 10, 2014 at 9:16 PM Subject: Re: [VOTE] Release Apache Spark 1.2.0 (RC2) To: Matei Zaharia <matei.zaha...@gmail.com>
+1 Works same as RC1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 13:07 min 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression OK Slight difference in the print method (vs. 1.1.x) of the model object - with a label & more details. This is good. 2.3. Decision Tree, Naive Bayes OK Changes in print(model) - now print (model.ToDebugString()) - OK Some changes in NaiveBayes. Different from my 1.1.x code - had to flatten list structures, zip required same number in partitions After code changes ran fine. 2.4. KMeans OK Center And Scale OK zip occasionally fails with error "localhost): org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition" Has https://issues.apache.org/jira/browse/SPARK-2251 reappeared ? Made it work by doing a different transformation ie reusing an original rdd. (Xiangrui, I will end you the iPython Notebook & the dataset by a separate e-mail) 2.5. rdd operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. recommendation OK 2.7. Good work ! In 1.x.x, had a map distinct over the movielens medium dataset which never worked. Works fine in 1.2.0 ! 3. Scala Mlib - subset of examples as in #2 above, with Scala 3.1. statistics OK 3.2. Linear Regression OK 3.3. Decision Tree OK 3.4. KMeans OK Cheers <k/> On Wed, Dec 10, 2014 at 3:05 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > +1 > > Tested on Mac OS X. > > Matei > > > On Dec 10, 2014, at 1:08 PM, Patrick Wendell <pwend...@gmail.com> wrote: > > > > Please vote on releasing the following candidate as Apache Spark version > 1.2.0! > > > > The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): > > > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e > > > > The release files, including signatures, digests, etc. can be found at: > > http://people.apache.org/~pwendell/spark-1.2.0-rc2/ > > > > Release artifacts are signed with the following key: > > https://people.apache.org/keys/committer/pwendell.asc > > > > The staging repository for this release can be found at: > > https://repository.apache.org/content/repositories/orgapachespark-1055/ > > > > The documentation corresponding to this release can be found at: > > http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ > > > > Please vote on releasing this package as Apache Spark 1.2.0! > > > > The vote is open until Saturday, December 13, at 21:00 UTC and passes > > if a majority of at least 3 +1 PMC votes are cast. > > > > [ ] +1 Release this package as Apache Spark 1.2.0 > > [ ] -1 Do not release this package because ... > > > > To learn more about Apache Spark, please see > > http://spark.apache.org/ > > > > == What justifies a -1 vote for this release? == > > This vote is happening relatively late into the QA period, so > > -1 votes should only occur for significant regressions from > > 1.0.2. Bugs already present in 1.1.X, minor > > regressions, or bugs related to new features will not block this > > release. > > > > == What default changes should I be aware of? == > > 1. The default value of "spark.shuffle.blockTransferService" has been > > changed to "netty" > > --> Old behavior can be restored by switching to "nio" > > > > 2. The default value of "spark.shuffle.manager" has been changed to > "sort". > > --> Old behavior can be restored by setting "spark.shuffle.manager" to > "hash". > > > > == How does this differ from RC1 == > > This has fixes for a handful of issues identified - some of the > > notable fixes are: > > > > [Core] > > SPARK-4498: Standalone Master can fail to recognize completed/failed > > applications > > > > [SQL] > > SPARK-4552: Query for empty parquet table in spark sql hive get > > IllegalArgumentException > > SPARK-4753: Parquet2 does not prune based on OR filters on partition > columns > > SPARK-4761: With JDBC server, set Kryo as default serializer and > > disable reference tracking > > SPARK-4785: When called with arguments referring column fields, PMOD > throws NPE > > > > - Patrick > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > > For additional commands, e-mail: dev-h...@spark.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >