[
https://issues.apache.org/jira/browse/ARROW-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314786#comment-17314786
]
Antoine Pitrou commented on ARROW-12203:
----------------------------------------
I'll note that ParquetVersion::PARQUET_2_0 guards several unrelated things:
* TIME_MILLIS, TIMESTAMP_MILLIS and UINT32, which were added in Parquet format
2.1 (PARQUET-12, July 2014)
* TIME_MICROS, TIMESTAMP_MICROS, which were added in Parquet format 2.3
(PARQUET-200, June 2015)
* the NANOS unit for times and timestamps, which was added in Parquet format
2.5 (PARQUET-1387, August 2018)
* RLE_DICTIONARY, which was added in Parquet format 1.0 (commit
eb2f34ca775476ec9955aa88a8ac5c0583114f72, no associated JIRA, November 2013)
* the value written in FileMetaData.version (1 or 2), which isn't described
anywhere in the format spec (presumably version == 2 starting from Parquet
format 2.0.0?)
So it's a mess. Some of those changes are very old, though. It seems we could
enable all of them by default, except NANOS?
One possibility would be to enable them for "1.0" and keep NANOS in "2.0".
Another possibility would be to add a new "1.9" pseudo-version, enable them in
"1.9", and make "1.9" the default.
Or we bite the bullet and make "2.0" the default, including all of the above.
> [C++][Python] Switch default Parquet version to 2.0
> ---------------------------------------------------
>
> Key: ARROW-12203
> URL: https://issues.apache.org/jira/browse/ARROW-12203
> Project: Apache Arrow
> Issue Type: Wish
> Components: C++, Python
> Reporter: Antoine Pitrou
> Priority: Major
>
> Currently, Parquet write APIs default to maximum-compatibility Parquet
> version "1.0", which disables some logical types such as UINT32. We may want
> to switch the default to "2.0" instead, to allow faithful representation of
> more types.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)