[ 
https://issues.apache.org/jira/browse/DRILL-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15644818#comment-15644818
 ] 

Parth Chandra commented on DRILL-4996:
--------------------------------------

[~vitalii] Not sure what you are recommending? Are you suggesting that even 
with the drill version known, you will check by looking at the actual values of 
the data and decide if the values need correction?
It looks like you might have to do that.

> Parquet Date auto-correction is not working in auto-partitioned parquet files 
> generated by drill-1.6
> ----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4996
>                 URL: https://issues.apache.org/jira/browse/DRILL-4996
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: item.tgz
>
>
> git.commit.id.abbrev=4ee1d4c
> Below are the steps I followed to generate the data :
> {code}
> 1. Generate a parquet file with date column using hive1.2
> 2. Use drill 1.6 to create auto-partitioned parquet files partitioned on the 
> date column
> {code}
> Now the below query returns wrong results :
> {code}
> select i_rec_start_date, i_size from 
> dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`  
> group by i_rec_start_date, i_size;
> +-------------------+--------------+
> | i_rec_start_date  |    i_size    |
> +-------------------+--------------+
> | null              | large        |
> | 366-11-08        | extra large  |
> | 366-11-08        | medium       |
> | null              | medium       |
> | 366-11-08        | petite       |
> | 364-11-07        | medium       |
> | null              | petite       |
> | 365-11-07        | medium       |
> | 368-11-07        | economy      |
> | 365-11-07        | large        |
> | 365-11-07        | small        |
> | 366-11-08        | small        |
> | 365-11-07        | extra large  |
> | 364-11-07        | N/A          |
> | 366-11-08        | economy      |
> | 366-11-08        | large        |
> | 364-11-07        | small        |
> | null              | small        |
> | 364-11-07        | large        |
> | 364-11-07        | extra large  |
> | 368-11-07        | N/A          |
> | 368-11-07        | extra large  |
> | 368-11-07        | large        |
> | 365-11-07        | petite       |
> | null              | N/A          |
> | 365-11-07        | economy      |
> | 364-11-07        | economy      |
> | 364-11-07        | petite       |
> | 365-11-07        | N/A          |
> | 368-11-07        | medium       |
> | null              | extra large  |
> | 368-11-07        | small        |
> | 368-11-07        | petite       |
> | 366-11-08        | N/A          |
> +-------------------+--------------+
> 34 rows selected (0.691 seconds)
> {code}
> However I tried generating the auto-partitioned parquet files using Drill 1.2 
> and then the above query returned the right results.
> I attached the required data sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to