[
https://issues.apache.org/jira/browse/PARQUET-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu resolved PARQUET-1972.
------------------------------
Resolution: Fixed
Current default version is 2.6: [GH-35746: [Parquet][C++][Python] Switch
default Parquet version to 2.6 by anjakefala · Pull Request #36137 ·
apache/arrow (github.com)|https://github.com/apache/arrow/pull/36137]
> [C++] Switch to format version 2 as default for writing Parquet
> ---------------------------------------------------------------
>
> Key: PARQUET-1972
> URL: https://issues.apache.org/jira/browse/PARQUET-1972
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp
> Reporter: Joris Van den Bossche
> Priority: Major
>
> Related to the thread on the arrow dev mailing list:
> https://lists.apache.org/thread.html/rf1a377c66990ae5ac0693119d416c93a7e19228d3eaaea8bd90acb17%40%3Cdev.arrow.apache.org%3E
> Currently, when writing parquet files with Arrow (parquet-cpp), we default to
> parquet format "1.0". In practice, this means that we don't use certain
> LogicalTypes (eg we don't write integers other than int32/int64, and we don't
> write the nanosecond timestamps).
> I think it would be nice to enable nanosecond timestamps by default, but I
> also have no idea how widely this is already supported by other readers.
> To be clear, this is *not* about enabling _data page_ version 2 by default,
> in Arrow that is governed by a separate option.
> While checking this, I made an overview of which types were introduced in
> which parquet format version, in case someone wants to see the details ->
> https://nbviewer.jupyter.org/gist/jorisvandenbossche/3cc9942eaffb53564df65395e5656702
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]