[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

Rajesh Chandramohan (JIRA) Mon, 11 Sep 2017 22:39:20 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162522#comment-16162522
 ]


Rajesh Chandramohan commented on SPARK-14927:
---------------------------------------------

[~ctang.ma] , I was talking here about HIVE-1.2 version & Spark-2.1.0. In Our 
Environment we are not yet moved to HIVE-2.0

*Once Spark Updates the HIVE table , later we can't access hive table via hive*
spark-sql> insert into table zeta_dev_gdw_tables.rajesh_user1 values ( 1002);

 Once above command executed the HDFS file structure changes. Then we can't 
access via hive , it throws error.

------------------
hive>  select * from zeta_dev_gdw_tables.rajesh_user1  ;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: 
Can not read value at 0 in block 0 in file 
hdfs://cluster/user/hive/warehouse/zeta_dev_gdw_tables.db/rajesh_user1/part-00000-d4d0b321-3e7e-4525-9de0-d64fb223e9b2.snappy.parquet
Time taken: 1.495 seconds
hive>
  
  
Hive log
Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 in 
block 0 in file 
hdfs://cluster/user/hive/warehouse/zeta_dev_gdw_tables.db/rajesh_user1/part-00000-d4d0b321-3e7e-4525-9de0-d64fb223e9b2.snappy.parquet
        at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
        at 
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:122)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:682)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:328)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:450)
        ... 15 more
Caused by: java.lang.UnsupportedOperationException: 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$8$1Should be 
related to this Bug: HIVE-15082

> DataFrame. saveAsTable creates RDD partitions but not Hive partitions
> ---------------------------------------------------------------------
>
>                 Key: SPARK-14927
>                 URL: https://issues.apache.org/jira/browse/SPARK-14927
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2, 1.6.1
>         Environment: Mac OS X 10.11.4 local
>            Reporter: Sasha Ovsankin
>
> This is a followup to 
> http://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive
>  . I tried to use suggestions in the answers but couldn't make it to work in 
> Spark 1.6.1
> I am trying to create partitions programmatically from `DataFrame. Here is 
> the relevant code (adapted from a Spark test):
>     hc.setConf("hive.metastore.warehouse.dir", "tmp/tests")
>     //    hc.setConf("hive.exec.dynamic.partition", "true")
>     //    hc.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
>     hc.sql("create database if not exists tmp")
>     hc.sql("drop table if exists tmp.partitiontest1")
>     Seq(2012 -> "a").toDF("year", "val")
>       .write
>       .partitionBy("year")
>       .mode(SaveMode.Append)
>       .saveAsTable("tmp.partitiontest1")
>     hc.sql("show partitions tmp.partitiontest1").show
> Full file is here: 
> https://gist.github.com/SashaOv/7c65f03a51c7e8f9c9e018cd42aa4c4a
> I get the error that the table is not partitioned:
>     ======================
>     HIVE FAILURE OUTPUT
>     ======================
>     SET hive.support.sql11.reserved.keywords=false
>     SET hive.metastore.warehouse.dir=tmp/tests
>     OK
>     OK
>     FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Table tmp.partitiontest1 is not a 
> partitioned table
>     ======================
> It looks like the root cause is that 
> `org.apache.spark.sql.hive.HiveMetastoreCatalog.newSparkSQLSpecificMetastoreTable`
>  always creates table with empty partitions.
> Any help to move this forward is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-14927) DataFrame. saveAsTable creates RDD partitions but not Hive partitions

Reply via email to