[
https://issues.apache.org/jira/browse/PARQUET-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127110#comment-17127110
]
Ben Watson commented on PARQUET-1870:
-------------------------------------
For backstory, I maintain an [Avro and Parquet Viewer IntelliJ
plugin|https://github.com/benwatson528/intellij-avro-parquet-plugin] that
allows Avro and Parquet files to be displayed visually, and a repeated
complaint is that it's not possible to view files containing INT96 columns.
I have been able to solve this by replacing
[AvroSchemaConverter#308|https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L307-L309]:
{code:java}
public Schema convertINT96(PrimitiveTypeName primitiveTypeName) {
throw new IllegalArgumentException("INT96 not implemented and is deprecated");
}
{code}
with
{code:java}
public Schema convertINT96(PrimitiveTypeName primitiveTypeName) {
return Schema.create(Schema.Type.BYTES);
}
{code}
This results in gibberish being printed, but at least the files are displayed.
I'm happy to raise a PR for this, but first want to check that this is an
acceptable solution and that no one else has any better ideas.
> Handle INT96 more gracefully in parquet-avro
> --------------------------------------------
>
> Key: PARQUET-1870
> URL: https://issues.apache.org/jira/browse/PARQUET-1870
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-avro
> Affects Versions: 1.11.0
> Reporter: Ben Watson
> Priority: Minor
>
> The parquet-avro library does not support INT96 columns (PARQUET-323), and
> any attempt to process a file containing such a column results in:
> {code:java}
> throw new IllegalArgumentException("INT96 not implemented and is
> deprecated");{code}
> INT96 is still used in many legacy datasets, and so it would be useful to be
> able to process Parquet files containing these records, even if the INT96
> values themselves aren't rendered.
> The same functionality has already been re-added into parquet-pig
> (PARQUET-1133).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)