alamb commented on PR #9678: URL: https://github.com/apache/arrow-rs/pull/9678#issuecomment-4225716362
> > A lot of other parquet implementations require this field, due to their generated thrift parser, even if they do not actually use the field for anything. I would be totally in favor of deprecating and skipping the field, but maybe a more compatible alternative would be to write the field as an empty list instead. > > Well, parquet-java actually uses the field to populate its version of `ColumnDescriptor`, so an empty list will be just as damaging to an old version. The whole idea is to change the field to optional in the thrift definition, and give the ecosystem a few years for that change to percolate. After some reasonable time has passed we can default to not writing the field. But in the meantime, users who have data sensitive to metadata bloat and know they have up-to-date tooling can help themselves earlier. See also related mailing list discussion: - https://lists.apache.org/thread/czm2bk45wwtkhhpqxqvmx9dk5wkwk1kt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
