[
https://issues.apache.org/jira/browse/DRILL-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025680#comment-14025680
]
Jason Altekruse commented on DRILL-815:
---------------------------------------
Impala does not currently mark columns with the standard parquet meta data to
indicate how data should be read. Instead they are using the hive meta-store to
persist this information. This is against the model of Drill where we are
avoiding a meta-store and just allowing users to point at any file and read it.
This means that for now this data must be cast to varchar if you want it to be
shown as strings. We should talk to Impala about supporting this meta-data
alongside the metastore, as this an issue for all the hadoop project that want
to read parquet produced files.
> Parquet files created in impala using data from hive tables resulted in
> incorrect string representation
> -------------------------------------------------------------------------------------------------------
>
> Key: DRILL-815
> URL: https://issues.apache.org/jira/browse/DRILL-815
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Norris Lee
> Assignee: Jason Altekruse
>
> The parquet file was created by first loading a csv file into a hive table. A
> parquet table was then created in impala and data from the hive table was
> loaded in. The file was extracted from hdfs to local and placed into drill's
> dfs.
> The keycolumn column in hive is of type string.
> {code}
> 0: jdbc:drill:schema=hivestg> select * from
> `dfs`.`/opt/drill/integer.parquet`;
> +------------+------------+
> | keycolumn | column1 |
> +------------+------------+
> | [B@7385c043 | 0 |
> | [B@5211a9f5 | 1 |
> | [B@5ad3deb | -1 |
> | [B@30bc1236 | 2 |
> | [B@b4fb039 | 127 |
> | [B@1cba73fc | -128 |
> | [B@1514b420 | 255 |
> | [B@23dabb0 | 128 |
> | [B@1ed2b0f6 | -129 |
> | [B@1a5ff649 | 256 |
> | [B@12224026 | 32767 |
> | [B@6a18817 | -32768 |
> | [B@56eda167 | 65535 |
> | [B@aff9dc7 | -32769 |
> | [B@13cf7975 | 32768 |
> | [B@1a2efa7c | 65536 |
> | [B@23ef052 | 2147483647 |
> | [B@721398a4 | -2147483648 |
> +------------+------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)