[
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656631#comment-16656631
]
Steve Loughran commented on SPARK-21725:
----------------------------------------
bq. can we fix it on the Hadoop side?
fix what?
the only way to handle close() of > 1 FS would be moving to referenced counted
filesystems everywhere. Otherwise:
* Applications which know they get a unique version of an FS instance need to
call close() on it. This matters especially for those connectors (object
stores, etc) which create thread pools, http connection pools, etc.
* Applications which don't set up for a unique FS version, must not call close.
Ref counted FS clients would be the ultimate way to do this, but I suspect it
is too late to do this
see: HADOOP-10792, HADOOP-4655, etc.
The general assumption is: if you want to manage the lifespan of your FS
instance, create a unique one yourself using {{FileSystem.newInstance()}}. The
method has been there since 0.21 so there's no reason not to adopt it.
> spark thriftserver insert overwrite table partition select
> -----------------------------------------------------------
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1 jdk8
> Reporter: xinzhang
> Priority: Major
> Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
> SET hive.default.fileformat=Parquet;create table tmp_10(count bigint)
> partitioned by (pt string) stored as parquet;
> --ok
> !exit
> session 2:
> SET hive.default.fileformat=Parquet;create table tmp_11(count bigint)
> partitioned by (pt string) stored as parquet;
> --ok
> !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10
> partition(pt='1') select count(1) count from tmp_11;
> --ok
> !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10
> partition(pt='1') select count(1) count from tmp_11;
> --error
> !exit
> -------------------------------------------------------------------------------------
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing
> query, currentState RUNNING,
> java.lang.reflect.InvocationTargetException
> ......
> ......
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move
> source
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-10000/part-00000 to destination
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> ....
> -------------------------------------------------------------------------------------
> the doc about the parquet table desc here
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL
> will try to use its own Parquet support instead of Hive SerDe for better
> performance. This behavior is controlled by the
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by
> default.
> I am confused the problem appear in the table(partitions) but it is ok with
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]