[ 
https://issues.apache.org/jira/browse/PARQUET-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127110#comment-17127110
 ] 

Ben Watson commented on PARQUET-1870:
-------------------------------------

For backstory, I maintain an [Avro and Parquet Viewer IntelliJ 
plugin|https://github.com/benwatson528/intellij-avro-parquet-plugin] that 
allows Avro and Parquet files to be displayed visually, and a repeated 
complaint is that it's not possible to view files containing INT96 columns.

I have been able to solve this by replacing 
[AvroSchemaConverter#308|https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L307-L309]:
{code:java}
public Schema convertINT96(PrimitiveTypeName primitiveTypeName) {
  throw new IllegalArgumentException("INT96 not implemented and is deprecated");
}
{code}
with
{code:java}
public Schema convertINT96(PrimitiveTypeName primitiveTypeName) {
  return Schema.create(Schema.Type.BYTES);
}
{code}
This results in gibberish being printed, but at least the files are displayed.

I'm happy to raise a PR for this, but first want to check that this is an 
acceptable solution and that no one else has any better ideas.

> Handle INT96 more gracefully in parquet-avro
> --------------------------------------------
>
>                 Key: PARQUET-1870
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1870
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-avro
>    Affects Versions: 1.11.0
>            Reporter: Ben Watson
>            Priority: Minor
>
> The parquet-avro library does not support INT96 columns (PARQUET-323), and 
> any attempt to process a file containing such a column results in:
> {code:java}
> throw new IllegalArgumentException("INT96 not implemented and is 
> deprecated");{code}
> INT96 is still used in many legacy datasets, and so it would be useful to be 
> able to process Parquet files containing these records, even if the INT96 
> values themselves aren't rendered.
> The same functionality has already been re-added into parquet-pig 
> (PARQUET-1133).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to