[GitHub] [iceberg] marton-bod opened a new pull request #3911: Hive: Do not skip IO config serialization for metadata queries

GitBox Mon, 17 Jan 2022 07:15:32 -0800


marton-bod opened a new pull request #3911:
URL: https://github.com/apache/iceberg/pull/3911



   Skipping the IO config serialization (introduced 
[here](https://github.com/apache/iceberg/commit/da712eaf60744c933c08fe1cab7a00cdcb2f4829)),
 followed by injecting the config on the deserialization-side saves a lot of 
memory on the query coordinator (e.g. Tez AM), but this approach does not work 
for some of the metadata queries. 
   
   When running standard queries, the tasks use the IO of the table object, 
which provides a way to inject the config via the `HadoopConfigurable` 
interface. However, some metadata table tasks, such as 
[DataFilesTable#ManifestReadTask](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/DataFilesTable.java#L139)
 keep their own IO instance for reading, and they provide no API to 
replace/inject their IO instance.
   
   We might want to tackle this in the future so that config serialization 
skipping can work for metadata queries too. In the interim, this PR disables IO 
config skipping for these queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] marton-bod opened a new pull request #3911: Hive: Do not skip IO config serialization for metadata queries

Reply via email to