rdblue commented on issue #106: Deep copy maps and lists in GenericDataFile. URL: https://github.com/apache/incubator-iceberg/pull/106#issuecomment-465769700 @rdsr, the data files themselves are not modified, although you could build a process that gathers these stats from a file and replaces the old DataFile with a new one with the data. The case that this is addressing is reuse of container objects while scanning manifest files. To avoid object creation, Iceberg will reuse Record, Map, and List objects and fill them with new data. That cuts down on object churn when most records are discarded because a file was deleted or doesn't match a filter. When a file is selected for a scan in `planFiles`, the DataFile that is returned is a copy because the next record read from the manifest will refill the reused record with new data. Unfortunately, that copy wasn't deep copying these maps, so the wrong data was returned.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
