[ 
https://issues.apache.org/jira/browse/ARROW-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314834#comment-17314834
 ] 

Joris Van den Bossche commented on ARROW-12203:
-----------------------------------------------

[~apitrou] see also the mailing list discussion from December (with title 
"Should we default to write parquet format version 2.0? (not data page version 
2.0)", 
https://mail-archives.apache.org/mod_mbox/arrow-dev/202012.mbox/%3CCALQtMBYqPPkE6RQiNDxXz7yOtnbqtQGH%2Bk%2B20ryomGtLE9EfVA%40mail.gmail.com%3E)

See also this overview of converted/logical types added in which versions: 
https://nbviewer.jupyter.org/gist/jorisvandenbossche/3cc9942eaffb53564df65395e5656702
 (for types, not for encodings)

My conclusion in that email-thread was also that the NANOS might be problematic 
to already enable by default (I don't know what the status of this feature is 
in other implementations ..)

Another option could also be to have a {{version="2.4"}} which eg would enable 
the logical types for integers but not yet for nanoseconds (then it maps more 
or less to the actual parquet format version, instead of the pseudo "1.9")

bq. the value written in FileMetaData.version (1 or 2), which isn't described 
anywhere in the format spec (presumably version == 2 starting from Parquet 
format 2.0.0?)

There is indeed not spec about this, there was some discussion about this on 
the "core features" PR: 
https://github.com/apache/parquet-format/pull/164#discussion_r569228238



> [C++][Python] Switch default Parquet version to 2.0
> ---------------------------------------------------
>
>                 Key: ARROW-12203
>                 URL: https://issues.apache.org/jira/browse/ARROW-12203
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: C++, Python
>            Reporter: Antoine Pitrou
>            Priority: Major
>             Fix For: 4.0.0
>
>
> Currently, Parquet write APIs default to maximum-compatibility Parquet 
> version "1.0", which disables some logical types such as UINT32. We may want 
> to switch the default to "2.0" instead, to allow faithful representation of 
> more types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to