marton-bod opened a new pull request #3911: URL: https://github.com/apache/iceberg/pull/3911
Skipping the IO config serialization (introduced [here](https://github.com/apache/iceberg/commit/da712eaf60744c933c08fe1cab7a00cdcb2f4829)), followed by injecting the config on the deserialization-side saves a lot of memory on the query coordinator (e.g. Tez AM), but this approach does not work for some of the metadata queries. When running standard queries, the tasks use the IO of the table object, which provides a way to inject the config via the `HadoopConfigurable` interface. However, some metadata table tasks, such as [DataFilesTable#ManifestReadTask](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/DataFilesTable.java#L139) keep their own IO instance for reading, and they provide no API to replace/inject their IO instance. We might want to tackle this in the future so that config serialization skipping can work for metadata queries too. In the interim, this PR disables IO config skipping for these queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
