[
https://issues.apache.org/jira/browse/ARROW-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314834#comment-17314834
]
Joris Van den Bossche commented on ARROW-12203:
-----------------------------------------------
[~apitrou] see also the mailing list discussion from December (with title
"Should we default to write parquet format version 2.0? (not data page version
2.0)",
https://mail-archives.apache.org/mod_mbox/arrow-dev/202012.mbox/%3CCALQtMBYqPPkE6RQiNDxXz7yOtnbqtQGH%2Bk%2B20ryomGtLE9EfVA%40mail.gmail.com%3E)
See also this overview of converted/logical types added in which versions:
https://nbviewer.jupyter.org/gist/jorisvandenbossche/3cc9942eaffb53564df65395e5656702
(for types, not for encodings)
My conclusion in that email-thread was also that the NANOS might be problematic
to already enable by default (I don't know what the status of this feature is
in other implementations ..)
Another option could also be to have a {{version="2.4"}} which eg would enable
the logical types for integers but not yet for nanoseconds (then it maps more
or less to the actual parquet format version, instead of the pseudo "1.9")
bq. the value written in FileMetaData.version (1 or 2), which isn't described
anywhere in the format spec (presumably version == 2 starting from Parquet
format 2.0.0?)
There is indeed not spec about this, there was some discussion about this on
the "core features" PR:
https://github.com/apache/parquet-format/pull/164#discussion_r569228238
> [C++][Python] Switch default Parquet version to 2.0
> ---------------------------------------------------
>
> Key: ARROW-12203
> URL: https://issues.apache.org/jira/browse/ARROW-12203
> Project: Apache Arrow
> Issue Type: Wish
> Components: C++, Python
> Reporter: Antoine Pitrou
> Priority: Major
> Fix For: 4.0.0
>
>
> Currently, Parquet write APIs default to maximum-compatibility Parquet
> version "1.0", which disables some logical types such as UINT32. We may want
> to switch the default to "2.0" instead, to allow faithful representation of
> more types.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)