Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

Peter Rudenko Mon, 01 Jun 2015 10:45:49 -0700

Still have problem using HiveContext from sbt. Here’s an example ofdependencies:

|val sparkVersion = "1.4.0-rc3" lazy val root = Project(id ="spark-hive", base = file("."), settings = Project.defaultSettings ++Seq( name := "spark-1.4-hive", scalaVersion := "2.10.5",scalaBinaryVersion := "2.10", resolvers += "Spark RC" at"https://repository.apache.org/content/repositories/orgapachespark-1110/";,libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" %sparkVersion, "org.apache.spark" %% "spark-mllib" % sparkVersion,"org.apache.spark" %% "spark-hive" % sparkVersion, "org.apache.spark" %%"spark-sql" % sparkVersion ) )) |


Launching sbt console with it and running:

|val conf = new SparkConf().setMaster("local[4]").setAppName("test") valsc = new SparkContext(conf) val sqlContext = neworg.apache.spark.sql.hive.HiveContext(sc) val data = sc.parallelize(1 to10000) import sqlContext.implicits._ scala> data.toDFjava.lang.IllegalArgumentException: Unable to locate hive jars toconnect to metastore using classloaderscala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please setspark.sql.hive.metastore.jars atorg.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206)atorg.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175) atorg.apache.spark.sql.hive.HiveContext$anon$2.<init>(HiveContext.scala:367)atorg.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367)at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366)atorg.apache.spark.sql.hive.HiveContext$anon$1.<init>(HiveContext.scala:379)atorg.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)atorg.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:134) atorg.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) atorg.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) atorg.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) atorg.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345)|


Thanks,
Peter Rudenko

On 2015-06-01 05:04, Guoqiang Li wrote:

+1 (non-binding)


------------------ Original ------------------
*From: * "Sandy Ryza";<[email protected]>;
*Date: * Mon, Jun 1, 2015 07:34 AM
*To: * "Krishna Sankar"<[email protected]>;

*Cc: * "Patrick Wendell"<[email protected]>;"[email protected]"<[email protected]>;

*Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

+1 (non-binding)

Launched against a pseudo-distributed YARN cluster running Hadoop2.6.0 and ran some jobs.


-Sandy

On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar <[email protected]<mailto:[email protected]>> wrote:


    +1 (non-binding, of course)

    1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
         mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
    -Dhadoop.version=2.6.0 -DskipTests
    2. Tested pyspark, mlib - running as well as compare results with
    1.3.1
    2.1. statistics (min,max,mean,Pearson,Spearman) OK
    2.2. Linear/Ridge/Laso Regression OK
    2.3. Decision Tree, Naive Bayes OK
    2.4. KMeans OK
           Center And Scale OK
    2.5. RDD operations OK
          State of the Union Texts - MapReduce, Filter,sortByKey (word
    count)
    2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
           Model evaluation/optimization (rank, numIter, lambda) with
    itertools OK
    3. Scala - MLlib
    3.1. statistics (min,max,mean,Pearson,Spearman) OK
    3.2. LinearRegressionWithSGD OK
    3.3. Decision Tree OK
    3.4. KMeans OK
    3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
    3.6. saveAsParquetFile OK
    3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
    registerTempTable, sql OK
    3.8. result = sqlContext.sql("SELECT
    OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM
    Orders INNER JOIN OrderDetails ON Orders.OrderID =
    OrderDetails.OrderID") OK
    4.0. Spark SQL from Python OK
    4.1. result = sqlContext.sql("SELECT * from people WHERE State =
    'WA'") OK

    Cheers
    <k/>

    On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell
    <[email protected] <mailto:[email protected]>> wrote:

        Please vote on releasing the following candidate as Apache
        Spark version 1.4.0!

        The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
        
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

        The release files, including signatures, digests, etc. can be
        found at:
        http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/
        
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-bin/>

        Release artifacts are signed with the following key:
        https://people.apache.org/keys/committer/pwendell.asc

        The staging repository for this release can be found at:
        [published as version: 1.4.0]
        https://repository.apache.org/content/repositories/orgapachespark-1109/
        [published as version: 1.4.0-rc3]
        https://repository.apache.org/content/repositories/orgapachespark-1110/

        The documentation corresponding to this release can be found at:
        http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/
        
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-docs/>

        Please vote on releasing this package as Apache Spark 1.4.0!

        The vote is open until Tuesday, June 02, at 00:32 UTC and passes
        if a majority of at least 3 +1 PMC votes are cast.

        [ ] +1 Release this package as Apache Spark 1.4.0
        [ ] -1 Do not release this package because ...

        To learn more about Apache Spark, please see
        http://spark.apache.org/

        == What has changed since RC1 ==
        Below is a list of bug fixes that went into this RC:
        http://s.apache.org/vN

        == How can I help test this release? ==
        If you are a Spark user, you can help us test this release by
        taking a Spark 1.3 workload and running on this release candidate,
        then reporting any regressions.

        == What justifies a -1 vote for this release? ==
        This vote is happening towards the end of the 1.4 QA period,
        so -1 votes should only occur for significant regressions from
        1.3.1.
        Bugs already present in 1.3.X, minor regressions, or bugs related
        to new features will not block this release.

        ---------------------------------------------------------------------
        To unsubscribe, e-mail: [email protected]
        <mailto:[email protected]>
        For additional commands, e-mail: [email protected]
        <mailto:[email protected]>

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

Reply via email to