[
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-21067:
------------------------------------
Assignee: Apache Spark
> Thrift Server - CTAS fail with Unable to move source
> ----------------------------------------------------
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
> Reporter: Dominic Ricard
> Assignee: Apache Spark
> Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift
> server. After that, dropping the table and re-issuing the same CTAS would
> fail with the following message (Sometime, it fails right away, sometime it
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-10000/part-00000
> to destination
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000;
> (state=,code=0)
> {noformat}
> We have already found the following Jira
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-10000/part-00000
> to destination
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000;
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE;
> CREATE SCHEMA dricard;
> CREATE TABLE dricard.test (col1 int);
> INSERT INTO TABLE dricard.test SELECT 1;
> SELECT * from dricard.test;
> DROP TABLE dricard.test;
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-10000/part-00000
> to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000;
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:231)
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move
> source
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-10000/part-00000
> to destination
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1645)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:728)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:676)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268)
> at
> org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:675)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:768)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
> at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
> ... 28 more
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
> at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093)
> at
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289)
> at
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
> ... 47 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]