[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

Cheng Lian (JIRA) Mon, 24 Aug 2015 01:36:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cheng Lian updated SPARK-10177:
-------------------------------
    Attachment: 000000_0

Attached the Parquet file generated by the Hive 0.14.0 SQL statement mentioned 
in the ticket description (Hive 1.2.1 should be the same).

Below are results of {{parquet-tools}} inspections:
{noformat}
$ parquet-schema 000000_0
message hive_schema {
  optional int96 _c0;
}



$ parquet-meta 000000_0
file:        file:/Users/lian/Desktop/000000_0
creator:     parquet-mr

file schema: hive_schema
---------------------------------------------------------------------------------------
_c0:         OPTIONAL INT96 R:0 D:1

row group 1: RC:1 TS:67 OFFSET:4
---------------------------------------------------------------------------------------
_c0:          INT96 UNCOMPRESSED DO:0 FPO:4 SZ:67/67/1.00 VC:1 
ENC:RLE,BIT_PACKED,PLAIN



$ parquet-dump 000000_0
row group 0
---------------------------------------------------------------------------------------
_c0:  INT96 UNCOMPRESSED DO:0 FPO:4 SZ:67/67/1.00 VC:1 ENC:RLE,BIT_PACKED,PLAIN

    _c0 TV=1 RL=0 DL=1
    
-----------------------------------------------------------------------------------
    page 0:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:18 VC:1

INT96 _c0
---------------------------------------------------------------------------------------
*** row group 1 of 1, values 1 to 1 ***
value 1: R:0 D:1 V:651896637159333601027328
{noformat}

> Parquet support interprets timestamp values differently from Hive 0.14.0+
> -------------------------------------------------------------------------
>
>                 Key: SPARK-10177
>                 URL: https://issues.apache.org/jira/browse/SPARK-10177
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Blocker
>         Attachments: 000000_0
>
>
> Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 
> 1.2.1):
> {code:sql}
> CREATE TABLE ts_test STORED AS PARQUET
> AS SELECT CAST("2015-01-01 00:00:00" AS TIMESTAMP);
> {code}
> Then read the Parquet file generated by Hive with Spark SQL:
> {noformat}
> scala> 
> sqlContext.read.parquet("hdfs://localhost:9000/user/hive/warehouse_hive14/ts_test").collect()
> res1: Array[org.apache.spark.sql.Row] = Array([2015-01-01 12:00:00.0])
> {noformat}
> Spark 1.4.1 works as expected in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-10177) Parquet support interprets timestamp values differently from Hive 0.14.0+

Reply via email to