[
https://issues.apache.org/jira/browse/PARQUET-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727995#comment-17727995
]
ASF GitHub Bot commented on PARQUET-1950:
-----------------------------------------
pitrou commented on PR #164:
URL: https://github.com/apache/parquet-format/pull/164#issuecomment-1570342348
Wild idea: instead of defining core features, how about rephrasing this in
terms of _presets_?
We could have a growing number of calendar-versioned presets, example:
* Preset 2023.06 : v2 data pages + delta encodings + ZSTD + Snappy + ZLib (+
logical types etc.)
* Preset 2024.11 : the former + byte stream split encoding + LZ4_RAW
* ...
I'm also skeptical that this needs to be advertised in the Thrift metadata.
Presets would mostly serve as a guideline for implementations and as an API
simplification for users.
> Define core features / compliance level
> ---------------------------------------
>
> Key: PARQUET-1950
> URL: https://issues.apache.org/jira/browse/PARQUET-1950
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-format
> Reporter: Gabor Szadovszky
> Assignee: Gabor Szadovszky
> Priority: Major
>
> Parquet format is getting more and more features while the different
> implementations cannot keep the pace and left behind with some features
> implemented and some are not. In many cases it is also not clear if the
> related feature is mature enough to be used widely or more an experimental
> one.
> These are huge issues that makes hard ensure interoperability between the
> different implementations.
> The following idea came up in a
> [discussion|https://lists.apache.org/thread.html/rde5cba8443487bccd47593ddf5dfb39f69c729d260165cb936a1a289%40%3Cdev.parquet.apache.org%3E].
> Create a now document in the parquet-format repository that lists the "core
> features". This document is versioned by the parquet-format releases. This
> way a certain version of "core features" defines a level of compatibility
> between the different implementations. This version number can be written to
> a new field (e.g. complianceLevel) in the footer. If an implementation writes
> a file with a version in the field it must implement all the related "core
> features" (read and write) and must not use any other features at write
> because it makes the data unreadable by another implementation if only the
> same level of "core features" are implemented.
> For example if we have encoding A listed in the version 1 "core features" but
> encoding B is not then at "complianceLevel = 1" we can use encoding A but we
> cannot use encoding B because it would make the related data unreadable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)