[
https://issues.apache.org/jira/browse/DRILL-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332485#comment-14332485
]
Adam Gilmore commented on DRILL-2286:
-------------------------------------
Right you are - a duplicate it is.
> Parquet compression causes read errors
> --------------------------------------
>
> Key: DRILL-2286
> URL: https://issues.apache.org/jira/browse/DRILL-2286
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 0.8.0
> Reporter: Adam Gilmore
> Assignee: Steven Phillips
> Priority: Critical
>
> From what I can see, since compression has been added to the Parquet writer,
> reading errors can occur.
> Basically, things like timestamp and decimal are stored as int64 with some
> metadata. It appears that when the column is compressed, it tries to read
> int64s into a vector of timestamp/decimal types, which causes a cast error.
> Here's the JSON file I'm using:
> {code}
> { "a": 1.5 }
> { "a": 3.5 }
> { "a": 1.5 }
> { "a": 2.5 }
> { "a": 1.5 }
> { "a": 5.5 }
> { "a": 1.5 }
> { "a": 6.0 }
> { "a": 1.5 }
> {code}
> Now create a Parquet table like so:
> create table dfs.tmp.test as (select cast(a as decimal(18,8)) from
> dfs.tmp.`test.json`)
> Now when you try to query it like so:
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.tmp.test;
> Query failed: RemoteRpcException: Failure while running fragment.,
> org.apache.drill.exec.vector.NullableDecimal18Vector cannot be cast to
> org.apache.drill.exec.vector.NullableBigIntVector [
> 91e23d42-fa06-4429-b78e-3ff32352e660 on ...:31010 ]
> [ 91e23d42-fa06-4429-b78e-3ff32352e660 on ...:31010 ]
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> {noformat}
> This is the same for timestamps, for example.
> The relevant code is in ColumnReaderFactory whereby if the column chunk is
> encoded, it creates specific readers based on the type of the column (in this
> case int64, instead of timestamp/decimal).
> This is pretty severe, as it looks like the compression is enabled by default
> now. I do note that with only 1-2 records in the JSON file, it doesn't
> bother compressing and the queries then work fine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)