Some more info I’m still digging.
I’m just trying to do `spark.table(“db.table”).count`from a spark-shell
“db.table” is just a hive table.
At commit b67668b this worked just fine and it returned the number of rows in
db.table.
Starting at ca99171 "[SPARK-15073][SQL] Hide SparkSession constructor from the
public” it fails with
org.apache.spark.sql.AnalysisException: Database ‘db' does not exist;
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:37)
at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:195)
at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireTableExists(InMemoryCatalog.scala:63)
at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.getTable(InMemoryCatalog.scala:186)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:337)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:524)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:520)
... 48 elided
If I run
"org.apache.spark.sql.SparkSession.builder.enableHiveSupport.getOrCreate.catalog.listDatabases.show(False)”
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
|name
|description|locationUri|
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
|Database[name='default', description='default database',
path='hdfs://ns/{CWD}/spark-warehouse']|
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+—————+
Where CWD is the current working directory of where I started my spark-shell.
It looks like this commit causes spark.catalog to be the internal one instead
of the Hive one.
Michael, I dont this this is related to the HDFS configurations, they are in
/etc/hadoop/conf on each of the nodes in the cluster.
Arun, I was referring to these docs,
http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html
they need to be updated to no refer to HiveContext.
I don’t think HiveContext should be marked as private[Hive], it should be
public.
I’ll keep digging.
Doug
> On May 19, 2016, at 6:52 PM, Reynold Xin <[email protected]> wrote:
>
> The old one is deprecated but should still work though.
>
>
> On Thu, May 19, 2016 at 3:51 PM, Arun Allamsetty <[email protected]>
> wrote:
> Hi Doug,
>
> If you look at the API docs here:
> http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext,
> you'll see
> Deprecate (Since version 2.0.0) Use SparkSession.builder.enableHiveSupport
> instead
> So you probably need to use that.
>
> Arun
>
> On Thu, May 19, 2016 at 3:44 PM, Michael Armbrust <[email protected]>
> wrote:
> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)” doesn’t
> work because “HiveContext not a member of org.apache.spark.sql.hive” I
> checked the documentation, and it looks like it should still work for
> spark-2.0.0-preview-bin-hadoop2.7.tgz
>
> HiveContext has been deprecated and moved to a 1.x compatibility package,
> which you'll need to include explicitly. Docs have not been updated yet.
>
> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it fails
> with a HDFS permission denied can’t write to “/user/hive/warehouse”
>
> Where are the HDFS configurations located? We might not be propagating them
> correctly any more.
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]