[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226720#comment-16226720 ]
Marco Gaido commented on SPARK-21725: ------------------------------------- [~zhangxin0112zx] I am sorry but I am still unable to reproduce it locally. Here you are the steps I performed. It might be related to the metastore. May you provide more details about your installation and the logs of the spark thriftserver? {code:java} ➜ spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:10000" Connecting to jdbc:hive2://localhost:10000 log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connected to: Spark SQL (version 2.3.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://localhost:10000> set hive.default.fileformat=Parquet; +--------------------------+----------+--+ | key | value | +--------------------------+----------+--+ | hive.default.fileformat | Parquet | +--------------------------+----------+--+ 1 row selected (0.434 seconds) 0: jdbc:hive2://localhost:10000> create table default.test_e(name string) partitioned by (pt string); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (0.472 seconds) 0: jdbc:hive2://localhost:10000> create table default.test_f(name string) partitioned by (pt string); +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (0.067 seconds) 0: jdbc:hive2://localhost:10000> !quit Closing: 0: jdbc:hive2://localhost:10000 ➜ spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:10000" Connecting to jdbc:hive2://localhost:10000 log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connected to: Spark SQL (version 2.3.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://localhost:10000> insert overwrite table default.test_e partition(pt="1") select count(1) from default.test_f; +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (2.351 seconds) 0: jdbc:hive2://localhost:10000> !quit Closing: 0: jdbc:hive2://localhost:10000 ➜ spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:10000" Connecting to jdbc:hive2://localhost:10000 log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connected to: Spark SQL (version 2.3.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://localhost:10000> insert overwrite table default.test_e partition(pt="1") select count(1) from default.test_f; +---------+--+ | Result | +---------+--+ +---------+--+ No rows selected (0.612 seconds) 0: jdbc:hive2://localhost:10000> {code} > spark thriftserver insert overwrite table partition select > ----------------------------------------------------------- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 > Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > ------------------------------------------------------------------------------------- > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > ...... > ...... > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-10000/part-00000 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > .... > ------------------------------------------------------------------------------------- > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org