[ 
https://issues.apache.org/jira/browse/PARQUET-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243106#comment-17243106
 ] 

ASF GitHub Bot commented on PARQUET-1928:
-----------------------------------------

gszadovszky commented on pull request #831:
URL: https://github.com/apache/parquet-mr/pull/831#issuecomment-737864597


   Parquet community was against adding INT96 support to not to encourage our 
clients to use it. While I understand the requirement of supporting the already 
written types. (Meanwhile as parquet-avro did not support INT96 ever this 
change is required for developments of new functionalities depending on the 
deprecated INT96.)
   Anyway, I am fine with this change but I do not really like that it works by 
default. What do you think about keeping the original behavior by default and 
introduce a configuration flag to switch it on? (See `writeParquetUUID` as an 
example.) This way we still not encourage the clients to use INT96 but have the 
option to do so if it is necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Interpret Parquet INT96 type as FIXED[12] AVRO Schema
> -----------------------------------------------------
>
>                 Key: PARQUET-1928
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1928
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-avro
>    Affects Versions: 1.11.0
>            Reporter: Anant Damle
>            Priority: Minor
>              Labels: patch
>             Fix For: 1.12.0
>
>
> Reading Parquet files in Apache Beam using ParquetIO uses `AvroParquetReader` 
> causing it to throw `IllegalArgumentException("INT96 not implemented and is 
> deprecated")`
> Customers have large datasets which can't be reprocessed again to convert 
> into a supported type. An easier approach would be to convert into a byte 
> array of 12 bytes, that can then be interpreted by the developer in any way 
> they want to interpret it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to