rdblue commented on issue #113: Truncate stats from Parquet files
URL: 
https://github.com/apache/incubator-iceberg/issues/113#issuecomment-469824435
 
 
   `DataFile` and `Metrics` are the classes that contain metrics and are good 
candidates for where truncation could happen. I think we would want truncation 
to be configurable using settings in `TableProperties`. Metrics are scraped 
from Parquet metadata in `ParquetMetrics`, which is called by `ParquetWriter`.
   
   You might want to explore passing a truncate length option to 
`ParquetWriter`. The writer would pass it to `ParquetMetrics` to truncate 
values right away. The setting would come from the table when creating a 
writer. For that, I think you'd update the write builder in `Parquet`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to