[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

Marco Gaido (JIRA) Tue, 31 Oct 2017 05:51:11 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226720#comment-16226720
 ]


Marco Gaido commented on SPARK-21725:
-------------------------------------

[~zhangxin0112zx] I am sorry but I am still unable to reproduce it locally.
Here you are the steps I performed. It might be related to the metastore. May 
you provide more details about your installation and the logs of the spark 
thriftserver?


{code:java}
➜  spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:10000"
Connecting to jdbc:hive2://localhost:10000
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Connected to: Spark SQL (version 2.3.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://localhost:10000> set hive.default.fileformat=Parquet; 
+--------------------------+----------+--+
|           key            |  value   |
+--------------------------+----------+--+
| hive.default.fileformat  | Parquet  |
+--------------------------+----------+--+
1 row selected (0.434 seconds)
0: jdbc:hive2://localhost:10000> create table default.test_e(name string) 
partitioned by (pt string);
+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (0.472 seconds)
0: jdbc:hive2://localhost:10000> create table default.test_f(name string) 
partitioned by (pt string);
+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (0.067 seconds)
0: jdbc:hive2://localhost:10000> !quit
Closing: 0: jdbc:hive2://localhost:10000
➜  spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:10000"
Connecting to jdbc:hive2://localhost:10000
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Connected to: Spark SQL (version 2.3.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://localhost:10000> insert overwrite table default.test_e 
partition(pt="1") select count(1) from default.test_f;
+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (2.351 seconds)
0: jdbc:hive2://localhost:10000> !quit
Closing: 0: jdbc:hive2://localhost:10000
➜  spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:10000"
Connecting to jdbc:hive2://localhost:10000
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Connected to: Spark SQL (version 2.3.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://localhost:10000> insert overwrite table default.test_e 
partition(pt="1") select count(1) from default.test_f;
+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (0.612 seconds)
0: jdbc:hive2://localhost:10000> 
{code}


> spark thriftserver insert overwrite table partition select 
> -----------------------------------------------------------
>
>                 Key: SPARK-21725
>                 URL: https://issues.apache.org/jira/browse/SPARK-21725
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>         Environment: centos 6.7 spark 2.1  jdk8
>            Reporter: xinzhang
>              Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -------------------------------------------------------------------------------------
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ......
> ......
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-10000/part-00000 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000
>         at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
>         at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
>         ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> ....
> -------------------------------------------------------------------------------------
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

Reply via email to