Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

Michael Armbrust Mon, 01 Jun 2015 14:44:44 -0700

Its no longer valid to start more than one instance of HiveContext in a
single JVM, as one of the goals of this refactoring was to allow connection
to more than one metastore from a single context.


For tests I suggest you use TestHive as we do in our unit tests.  It has a
reset() method you can use to cleanup state between tests/suites.

We could also add an explicit close() method to remove this restriction,
but if thats something you want to investigate we should move this off the
vote thread and onto JIRA.

On Tue, Jun 2, 2015 at 7:19 AM, Peter Rudenko <petro.rude...@gmail.com>
wrote:

>  Thanks Yin, tried on a clean VM - works now. But tests in my app still
> fails:
>
> [info]   Cause: javax.jdo.JDOFatalDataStoreException: Unable to open a test 
> connection to the given database. JDBC url = 
> jdbc:derby:;databaseName=metastore_db;create=true, username = APP. 
> Terminating connection pool (set lazyInit to true if you expect to start your 
> database after your app). Original Exception: ------
> [info] java.sql.SQLException: Failed to start database 'metastore_db' with 
> class loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@380628de, see 
> the next exception for details.
> [info]     at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
> [info]     at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown 
> Source)
> [info]     at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
> [info]     at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
> Source)
> [info]     at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown 
> Source)
> [info]     at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(Unknown 
> Source)
> [info]     at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown 
> Source)
> [info]     at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
> [info]     at org.apache.derby.jdbc.Driver20.connect(Unknown Source)
> [info]     at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
> [info]     at java.sql.DriverManager.getConnection(DriverManager.java:571)
> [info]     at java.sql.DriverManager.getConnection(DriverManager.java:187)
> [info]     at 
> com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
> [info]     at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
> [info]     at 
> com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
>
> I’ve set
>
> parallelExecution in Test := false,
>
> Thanks,
> Peter Rudenko
>
> On 2015-06-01 21:10, Yin Huai wrote:
>
>   Hi Peter,
>
>  Based on your error message, seems you were not using the RC3. For the
> error thrown at HiveContext's line 206, we have changed the message to this
> one
> <https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L205-207>
>  just
> before RC3. Basically, we will not print out the class loader name. Can you
> check if a older version of 1.4 branch got used? Have you published a RC3
> to your local maven repo? Can you clean your local repo cache and try again?
>
>  Thanks,
>
>  Yin
>
> On Mon, Jun 1, 2015 at 10:45 AM, Peter Rudenko < <petro.rude...@gmail.com>
> petro.rude...@gmail.com> wrote:
>
>>  Still have problem using HiveContext from sbt. Here’s an example of
>> dependencies:
>>
>>  val sparkVersion = "1.4.0-rc3"
>>
>>     lazy val root = Project(id = "spark-hive", base = file("."),
>>        settings = Project.defaultSettings ++ Seq(
>>        name := "spark-1.4-hive",
>>        scalaVersion := "2.10.5",
>>        scalaBinaryVersion := "2.10",
>>        resolvers += "Spark RC" at 
>> "https://repository.apache.org/content/repositories/orgapachespark-1110/"; 
>> <https://repository.apache.org/content/repositories/orgapachespark-1110/>,
>>        libraryDependencies ++= Seq(
>>          "org.apache.spark" %% "spark-core" % sparkVersion,
>>          "org.apache.spark" %% "spark-mllib" % sparkVersion,
>>          "org.apache.spark" %% "spark-hive" % sparkVersion,
>>          "org.apache.spark" %% "spark-sql" % sparkVersion
>>         )
>>
>>   ))
>>
>> Launching sbt console with it and running:
>>
>> val conf = new SparkConf().setMaster("local[4]").setAppName("test")
>> val sc = new SparkContext(conf)
>> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> val data = sc.parallelize(1 to 10000)
>> import sqlContext.implicits._
>> scala> data.toDF
>> java.lang.IllegalArgumentException: Unable to locate hive jars to connect to 
>> metastore using classloader 
>> scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set 
>> spark.sql.hive.metastore.jars
>>     at 
>> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206)
>>     at 
>> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175)
>>     at 
>> org.apache.spark.sql.hive.HiveContext$anon$2.<init>(HiveContext.scala:367)
>>     at 
>> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367)
>>     at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366)
>>     at 
>> org.apache.spark.sql.hive.HiveContext$anon$1.<init>(HiveContext.scala:379)
>>     at 
>> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)
>>     at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)
>>     at 
>> org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)
>>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:134)
>>     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>>     at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474)
>>     at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456)
>>     at 
>> org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345)
>>
>> Thanks,
>> Peter Rudenko
>>
>> On 2015-06-01 05:04, Guoqiang Li wrote:
>>
>> +1 (non-binding)
>>
>>
>>  ------------------ Original ------------------
>>  *From: * "Sandy Ryza"; <sandy.r...@cloudera.com>
>> <sandy.r...@cloudera.com> <sandy.r...@cloudera.com>;
>> *Date: * Mon, Jun 1, 2015 07:34 AM
>> *To: * "Krishna Sankar" <ksanka...@gmail.com><ksanka...@gmail.com>
>> <ksanka...@gmail.com>;
>> *Cc: * "Patrick Wendell" <pwend...@gmail.com><pwend...@gmail.com>
>> <pwend...@gmail.com>; "dev@spark.apache.org" <dev@spark.apache.org>
>> <dev@spark.apache.org><dev@spark.apache.org> <dev@spark.apache.org>;
>> *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
>>
>>  +1 (non-binding)
>>
>>  Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0
>> and ran some jobs.
>>
>>  -Sandy
>>
>> On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar < <ksanka...@gmail.com>
>> ksanka...@gmail.com> wrote:
>>
>>>  +1 (non-binding, of course)
>>>
>>>  1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
>>>      mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
>>> -Dhadoop.version=2.6.0 -DskipTests
>>> 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
>>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
>>> 2.2. Linear/Ridge/Laso Regression OK
>>> 2.3. Decision Tree, Naive Bayes OK
>>> 2.4. KMeans OK
>>>        Center And Scale OK
>>> 2.5. RDD operations OK
>>>       State of the Union Texts - MapReduce, Filter,sortByKey (word count)
>>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>>        Model evaluation/optimization (rank, numIter, lambda) with
>>> itertools OK
>>> 3. Scala - MLlib
>>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
>>> 3.2. LinearRegressionWithSGD OK
>>> 3.3. Decision Tree OK
>>> 3.4. KMeans OK
>>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>> 3.6. saveAsParquetFile OK
>>> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
>>> registerTempTable, sql OK
>>> 3.8. result = sqlContext.sql("SELECT
>>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
>>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
>>> 4.0. Spark SQL from Python OK
>>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'")
>>> OK
>>>
>>>  Cheers
>>>  <k/>
>>>
>>> On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell < <pwend...@gmail.com>
>>> pwend...@gmail.com> wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 1.4.0!
>>>>
>>>> The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
>>>>
>>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/
>>>>
>>>> Release artifacts are signed with the following key:
>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>
>>>> The staging repository for this release can be found at:
>>>> [published as version: 1.4.0]
>>>> https://repository.apache.org/content/repositories/orgapachespark-1109/
>>>> [published as version: 1.4.0-rc3]
>>>> https://repository.apache.org/content/repositories/orgapachespark-1110/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/
>>>>
>>>> Please vote on releasing this package as Apache Spark 1.4.0!
>>>>
>>>> The vote is open until Tuesday, June 02, at 00:32 UTC and passes
>>>> if a majority of at least 3 +1 PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 1.4.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see
>>>> http://spark.apache.org/
>>>>
>>>> == What has changed since RC1 ==
>>>> Below is a list of bug fixes that went into this RC:
>>>> http://s.apache.org/vN
>>>>
>>>> == How can I help test this release? ==
>>>> If you are a Spark user, you can help us test this release by
>>>> taking a Spark 1.3 workload and running on this release candidate,
>>>> then reporting any regressions.
>>>>
>>>> == What justifies a -1 vote for this release? ==
>>>> This vote is happening towards the end of the 1.4 QA period,
>>>> so -1 votes should only occur for significant regressions from 1.3.1.
>>>> Bugs already present in 1.3.X, minor regressions, or bugs related
>>>> to new features will not block this release.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: <dev-unsubscr...@spark.apache.org>
>>>> dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: <dev-h...@spark.apache.org>
>>>> dev-h...@spark.apache.org
>>>>
>>>>
>>>
>>   
>>
>
>    
>

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

Reply via email to