[ 
https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549577#comment-15549577
 ] 

Vitalii Diravka commented on DRILL-4203:
----------------------------------------

[~rkins] It is right that drill auto correct it. And yes, you are. You are 
using not that option. The behaviour of both readers is the same. If you want 
to disable "auto correction" you should use the parquet config in the plugin 
settings. Something like this: {code}  "formats": {
    "parquet": {
      "type": "parquet",
      "autoCorrectCorruptDates": false
    }{code}
Or you can try to use the next query: {code}select l_shipdate, l_commitdate 
from 
table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem`
 (type => 'parquet', autoCorrectCorruptDates => false)) limit 1;{code}

And it would be good more investigate the possibility to store from drill dates 
over 9999 years, cause from drill shell I can't got such values: {code}0: 
jdbc:drill:zk=local> select TO_DATE(262784904600000) from (VALUES(1));
+-------------+
|   EXPR$0    |
+-------------+
| 297-04-27  |
+-------------+
{code}
But from drill unit test I can do it:
{code}  @Test
  public void myTest() throws Exception {
    String query = "select TO_DATE(262784904600000) from (VALUES(1))";
    setColumnWidths(new int[] {35});
    List<QueryDataBatch> sqlWithResults = testSqlWithResults(query);
    printResult(sqlWithResults);
  }
1 row(s):
--------------------------------------
| EXPR$0<DATE(REQUIRED)>             |
--------------------------------------
| 10297-04-27T22:50:00.000Z          |
--------------------------------------
{code}

> Parquet File : Date is stored wrongly
> -------------------------------------
>
>                 Key: DRILL-4203
>                 URL: https://issues.apache.org/jira/browse/DRILL-4203
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Stéphane Trou
>            Assignee: Vitalii Diravka
>            Priority: Critical
>             Fix For: 1.9.0
>
>
> Hello,
> I have some problems when i try to read parquet files produce by drill with  
> Spark,  all dates are corrupted.
> I think the problem come from drill :)
> {code}
> cat /tmp/date_parquet.csv 
> Epoch,1970-01-01
> {code}
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) 
> as epoch_date from dfs.tmp.`date_parquet.csv`;
> +--------+-------------+
> |  name  | epoch_date  |
> +--------+-------------+
> | Epoch  | 1970-01-01  |
> +--------+-------------+
> {code}
> {code}
> 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select 
> columns[0] as name, cast(columns[1] as date) as epoch_date from 
> dfs.tmp.`date_parquet.csv`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1                          |
> +-----------+----------------------------+
> {code}
> When I read the file with parquet tools, i found  
> {code}
> java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
> name = Epoch
> epoch_date = 4881176
> {code}
> According to 
> [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], 
> epoch_date should be equals to 0.
> Meta : 
> {code}
> java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
> file:        file:/tmp/buggy_parquet/0_0_0.parquet 
> creator:     parquet-mr version 1.8.1-drill-r0 (build 
> 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
> extra:       drill.version = 1.4.0 
> file schema: root 
> --------------------------------------------------------------------------------
> name:        OPTIONAL BINARY O:UTF8 R:0 D:1
> epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1
> row group 1: RC:1 TS:93 OFFSET:4 
> --------------------------------------------------------------------------------
> name:         BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 
> ENC:RLE,BIT_PACKED,PLAIN
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to