[jira] [Comment Edited] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version

Jeff Mink (JIRA) Thu, 05 Nov 2015 09:07:17 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991989#comment-14991989
 ]


Jeff Mink edited comment on SPARK-11021 at 11/5/15 5:06 PM:
------------------------------------------------------------

We came across a seemingly similar issue with using older versions of Hive 
through Spark (It's hard to tell, because I don't see the query that caused the 
error above).

We are running Spark 1.5.1 with Hive 1.0. Any time we ran an INSERT OVERWRITE 
or CREATE TABLE AS SELECT through Spark's SQL context, we would see the 
following:

{noformat}
15/11/05 09:51:47 INFO output.FileOutputCommitter: Saved output of task 
'attempt_201511050951_0000_m_000000_0' to 
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1/-ext-10000/_temporary/0/task_201511050951_0000_m_000000
...
15/11/05 09:51:47 INFO common.FileUtils: deleting  
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1
{noformat}

The reason it was doing this is because there is a new setting in later 
versions of Hive (I think 1.2?) 'hive.exec.stagingdir' that, by default, is set 
to '.hive-staging'. This causes the staging data to be written to a 
subdirectory of the table that we are overwriting, which means that when our 
process gets to the OVERWRITE stage, it deletes the staging folder along with 
everything else in the table's location.

The fix for this was to edit our '/opt/spark/hive-site.xml' and add the 
following entry (of course, you can set this to whatever works for you):

{noformat}
  <property>
    <name>hive.exec.stagingdir</name>
    <value>/tmp/hive/spark-${user.name}</value>
  </property>
{noformat}


was (Author: jxmink):
We came across a seemingly similar issue with using older versions of Hive 
through Spark (It's hard to tell, because I don't see the query that caused the 
error above).

We are running Spark 1.5.1 with Hive 1.0. Any time we ran an INSERT OVERWRITE 
or CREATE TABLE AS SELECT through Spark's SQL context, we would see the 
following:
```
15/11/05 09:51:47 INFO output.FileOutputCommitter: Saved output of task 
'attempt_201511050951_0000_m_000000_0' to 
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1/-ext-10000/_temporary/0/task_201511050951_0000_m_000000
...
15/11/05 09:51:47 INFO common.FileUtils: deleting  
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1
```

The reason it was doing this is because there is a new setting in later 
versions of Hive (I think 1.2?) `hive.exec.stagingdir` that, by default, is set 
to `.hive-staging`. This causes the staging data to be written to a 
subdirectory of the table that we are overwriting, which means that when our 
process gets to the OVERWRITE stage, it deletes the staging folder along with 
everything else in the table's location.

The fix for this was to edit our `/opt/spark/hive-site.xml` and add the 
following entry (of course, you can set this to whatever works for you):
```
  <property>
    <name>hive.exec.stagingdir</name>
    <value>/tmp/hive/spark-${user.name}</value>
  </property>
```

> SparkSQL cli throws exception when using with Hive 0.12 metastore in 
> spark-1.5.0 version
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-11021
>                 URL: https://issues.apache.org/jira/browse/SPARK-11021
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: iward
>
> After upgrade spark from 1.4.1 to 1.5.0,I get the following exception when I 
> set set the following properties in spark-defaults.conf:
> {noformat}
> spark.sql.hive.metastore.version=0.12.0
> spark.sql.hive.metastore.jars=hive 0.12 jars and hadoop jars
> {noformat}
> when I run a task,it got following exception:
> {noformat}
> java.lang.reflect.InvocationTargetException
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:249)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488)
>       at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
>       at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
>       at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263)
>       at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>       at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:927)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:927)
>       at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
>       at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129)
>       at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>       at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:61)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:165)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>       at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>       at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> results from 
> hdfs://ns1/user/dd_edw/warehouse/tmp/gdm_m10_afs_task_process_spark/.hive-staging_hive_2015-10-09_11-34-50_831_2280183503220873069-1/-ext-10000
>  to destination directory: 
> /user/dd_edw/warehouse/tmp/gdm_m10_afs_task_process_spark
>       at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2303)
>       at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:639)
>       at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1441)
>       ... 40 more
> {noformat}
> The task can be run in spark-1.4.1 and hive 0.12.
> Is spark-1.5.0 incompatible with hive 0.12 metastore？Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version

Reply via email to