rdblue commented on issue #113: Truncate stats from Parquet files URL: https://github.com/apache/incubator-iceberg/issues/113#issuecomment-469824435 `DataFile` and `Metrics` are the classes that contain metrics and are good candidates for where truncation could happen. I think we would want truncation to be configurable using settings in `TableProperties`. Metrics are scraped from Parquet metadata in `ParquetMetrics`, which is called by `ParquetWriter`. You might want to explore passing a truncate length option to `ParquetWriter`. The writer would pass it to `ParquetMetrics` to truncate values right away. The setting would come from the table when creating a writer. For that, I think you'd update the write builder in `Parquet`.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
