[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

ASF GitHub Bot (JIRA) Mon, 03 Oct 2016 12:50:43 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543230#comment-15543230
 ]


ASF GitHub Bot commented on DRILL-4203:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/595#discussion_r81625440
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java 
---
    @@ -935,6 +972,11 @@ public ColumnTypeMetadata_v2 
getColumnTypeInfo(String[] name) {
         @JsonIgnore @Override public ParquetTableMetadataBase clone() {
           return new ParquetTableMetadata_v2(files, directories, 
columnTypeInfo);
         }
    +
    +    @JsonIgnore @Override public boolean isDateCorrect() {
    +      return isDateCorrect;
    --- End diff --
    
    If metadata cache file is existed Drill reads it instead of retrieving 
metadata from multiple Parquet files. In the case when it was generated with 
drill after this commit the value of isDateCorrect will be true. In the case 
when it was generated with drill before this commit the isDateCorrect field in 
metadata cache file will be absent and value of this will be false in 
ParquetTableMetadata_v2.
    And according to this value we just define DateCorruptionStatus (you can 
see more in ParquetReaderUtility.correctDatesInMetadataCache()). The leftover 
way of data checking in the cache was not changed.


> Parquet File : Date is stored wrongly
> -------------------------------------
>
>                 Key: DRILL-4203
>                 URL: https://issues.apache.org/jira/browse/DRILL-4203
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Stéphane Trou
>            Assignee: Vitalii Diravka
>            Priority: Critical
>             Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> +--------+-------------+
> |  name  | epoch_date  |
> +--------+-------------+
> | Epoch  | 1970-01-01  |
> +--------+-------------+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1                          |
> +-----------+----------------------------+
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:        file:/tmp/buggy_parquet/0_0_0.parquet 
> creator:     parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:       drill.version = 1.4.0 
> file schema: root 
> --------------------------------------------------------------------------------
> name:        OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> --------------------------------------------------------------------------------
> name:         BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly

Reply via email to