[
https://issues.apache.org/jira/browse/DRILL-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645260#comment-15645260
]
Rahul Challapalli commented on DRILL-4996:
------------------------------------------
[~vitalii] Based on my understanding of your comments, drill-1.6.0 somehow
wrote correct values for the date column. To test it I went back to a commit
before the fix for drill-4203 expecting drill to return correct values. However
I still get wrong values.
{code}
[root@qa-node190 drillAutomation]# /opt/drill/bin/sqlline -u
jdbc:drill:zk=10.10.100.190:5181
apache drill 1.9.0-SNAPSHOT
"say hello to my little drill"
0: jdbc:drill:zk=10.10.100.190:5181> select * from sys.version;
+-----------------+-------------------------------------------+--------------------------------------------------------------------+----------------------------+-----------------------------+----------------------------+
| version | commit_id |
commit_message | commit_time
| build_email | build_time |
+-----------------+-------------------------------------------+--------------------------------------------------------------------+----------------------------+-----------------------------+----------------------------+
| 1.9.0-SNAPSHOT | 17b96484998691687a00aa74cd69eef2fff6bae1 | DRILL-4870
drill-config.sh sets JAVA_HOME incorrectly for the Mac | 14.10.2016 @ 10:53:31
PDT | [email protected] | 07.11.2016 @ 11:39:46 PST |
+-----------------+-------------------------------------------+--------------------------------------------------------------------+----------------------------+-----------------------------+----------------------------+
1 row selected (0.712 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select i_rec_start_date, i_size from
dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`
group by i_rec_start_date, i_size;
+-------------------+--------------+
| i_rec_start_date | i_size |
+-------------------+--------------+
| null | large |
| 366-11-08 | extra large |
| 366-11-08 | medium |
| null | medium |
| 366-11-08 | petite |
| 364-11-07 | medium |
| null | petite |
| 365-11-07 | medium |
| 368-11-07 | economy |
| 365-11-07 | large |
| 365-11-07 | small |
| 366-11-08 | small |
| 365-11-07 | extra large |
| 364-11-07 | N/A |
| 366-11-08 | economy |
| 366-11-08 | large |
| 364-11-07 | small |
| null | small |
| 364-11-07 | large |
| 364-11-07 | extra large |
| 368-11-07 | N/A |
| 368-11-07 | extra large |
| 368-11-07 | large |
| 365-11-07 | petite |
| null | N/A |
| 365-11-07 | economy |
| 364-11-07 | economy |
| 364-11-07 | petite |
| 365-11-07 | N/A |
| 368-11-07 | medium |
| null | extra large |
| 368-11-07 | small |
| 368-11-07 | petite |
| 366-11-08 | N/A |
+-------------------+--------------+
34 rows selected (1.303 seconds)
{code}
Thoughts?
> Parquet Date auto-correction is not working in auto-partitioned parquet files
> generated by drill-1.6
> ----------------------------------------------------------------------------------------------------
>
> Key: DRILL-4996
> URL: https://issues.apache.org/jira/browse/DRILL-4996
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Rahul Challapalli
> Priority: Critical
> Attachments: item.tgz
>
>
> git.commit.id.abbrev=4ee1d4c
> Below are the steps I followed to generate the data :
> {code}
> 1. Generate a parquet file with date column using hive1.2
> 2. Use drill 1.6 to create auto-partitioned parquet files partitioned on the
> date column
> {code}
> Now the below query returns wrong results :
> {code}
> select i_rec_start_date, i_size from
> dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`
> group by i_rec_start_date, i_size;
> +-------------------+--------------+
> | i_rec_start_date | i_size |
> +-------------------+--------------+
> | null | large |
> | 366-11-08 | extra large |
> | 366-11-08 | medium |
> | null | medium |
> | 366-11-08 | petite |
> | 364-11-07 | medium |
> | null | petite |
> | 365-11-07 | medium |
> | 368-11-07 | economy |
> | 365-11-07 | large |
> | 365-11-07 | small |
> | 366-11-08 | small |
> | 365-11-07 | extra large |
> | 364-11-07 | N/A |
> | 366-11-08 | economy |
> | 366-11-08 | large |
> | 364-11-07 | small |
> | null | small |
> | 364-11-07 | large |
> | 364-11-07 | extra large |
> | 368-11-07 | N/A |
> | 368-11-07 | extra large |
> | 368-11-07 | large |
> | 365-11-07 | petite |
> | null | N/A |
> | 365-11-07 | economy |
> | 364-11-07 | economy |
> | 364-11-07 | petite |
> | 365-11-07 | N/A |
> | 368-11-07 | medium |
> | null | extra large |
> | 368-11-07 | small |
> | 368-11-07 | petite |
> | 366-11-08 | N/A |
> +-------------------+--------------+
> 34 rows selected (0.691 seconds)
> {code}
> However I tried generating the auto-partitioned parquet files using Drill 1.2
> and then the above query returned the right results.
> I attached the required data sets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)