For normalization I agree with Ryan. I was part of those other discussions
and I think
it does seem like this is an engine concern and not a storage one.

I'm also ok with basically getting no value from min/max of non-shredded
fields.

On Wed, Dec 11, 2024 at 4:35 AM Antoine Pitrou <anto...@python.org> wrote:

> On Mon, 9 Dec 2024 16:33:51 -0800
> "rdb...@gmail.com"
> <rdb...@gmail.com> wrote:
> > I think that Parquet should exactly reproduce the data that is written to
> > files, rather than either allowing or requiring Parquet implementations
> to
> > normalize types. To me, that's a fundamental guarantee of the storage
> > layer. The compute layer can decide to normalize types and take actions
> to
> > make storage more efficient, but storage should not modify the data that
> is
> > passed to it.
>
> FWIW, I agree with this.
>
> Regards
>
> Antoine.
>
>
>

Reply via email to