For normalization I agree with Ryan. I was part of those other discussions and I think it does seem like this is an engine concern and not a storage one.
I'm also ok with basically getting no value from min/max of non-shredded fields. On Wed, Dec 11, 2024 at 4:35 AM Antoine Pitrou <anto...@python.org> wrote: > On Mon, 9 Dec 2024 16:33:51 -0800 > "rdb...@gmail.com" > <rdb...@gmail.com> wrote: > > I think that Parquet should exactly reproduce the data that is written to > > files, rather than either allowing or requiring Parquet implementations > to > > normalize types. To me, that's a fundamental guarantee of the storage > > layer. The compute layer can decide to normalize types and take actions > to > > make storage more efficient, but storage should not modify the data that > is > > passed to it. > > FWIW, I agree with this. > > Regards > > Antoine. > > >