>
> I'm sorry for being so stubborn and insistent,

Not at all.


> but Parquet files are
> produced routinely by data scientists and other people with no expert
> knowledge of Parquet internals.


I think  presets make a lot of sense for an end user.  Having something
time-based should give a better proxy for non-experts to assess risks of if
a file is going to be readable by existing deployments.  But for most
end-users it isn't clear that they would actually touch the default preset.
If users override the default  hopefully implementation documentation would
highlight risks and provide a link to the implementation status page.


> If "how to produce an optimized Parquet file" takes an entire paragraph
> to explain and requires diving into tables of features, then we haven't
> solved the problem.


Right, we need good defaults, and we need implementation to push the
default preset up over time, otherwise new features will not be adopted.
There will always be advanced users that want to tweak things. I don't
think having to read a paragraph is unwarranted for people that want to be
on the bleeding edge, so they understand the risks.


(also, even I don't know what to do with the information of "Arrow C++
> does not support 2025 features": what does it bring to the reader?)


If a user wants to write data that is broadly interpretable in the
ecosystem, or knows that they want to share with C++ based implementations
they should avoid using the features it doesn't support.  The table could
just as easily be grouped on "version" or "preset".  I would hope most
users would never need to consult it if implementations are reasonably
conservative with the features they enable.  Once they need to consult it,
it allows them to decide how they want to tailor their files (either via
individual knobs to moving to an older preset).

Cheers,
Micah


On Thu, Jun 11, 2026 at 1:52 AM Antoine Pitrou <[email protected]> wrote:

> Le 10/06/2026 à 16:40, Micah Kornfield a écrit :
> >
> > In any case, this does not seem to be solving the problem of "as a user,
> >> how do I enable features safely".
> >
> > Can you elaborate?  Every feature listed after 2023, hass the year it was
> > introduced in parenthesis next to it.
> > I think this in addition to
> > the  table showing the version that everything was supported in, can
> give a
> > user a pretty good idea of what might be safe
>
> Ok, so concretely, what is a user supposed to do with these tables?
>
> I'm sorry for being so stubborn and insistent, but Parquet files are
> produced routinely by data scientists and other people with no expert
> knowledge of Parquet internals.
>
> If "how to produce an optimized Parquet file" takes an entire paragraph
> to explain and requires diving into tables of features, then we haven't
> solved the problem.
>
>
> (also, even I don't know what to do with the information of "Arrow C++
> does not support 2025 features": what does it bring to the reader?)
>
> Regards
>
> Antoine.
>
>
>

Reply via email to