jackye1995 commented on pull request #3752: URL: https://github.com/apache/iceberg/pull/3752#issuecomment-997030992
Sounds like what is needed here is not the ability to skip serialization of the config, but to add the ability to set the entire `FileIO` for a `SerializableTable` at worker node, because Hadoop configuration might not be the only thing that could take a lot of serialization space, and it seems like we are adding a special handling just for that. There is also some security concerns in this approach when you say > also because executors might not be authorized to communicate with the metastore to load the table In Trino we have a singleton `FileIOProvider`, which is responsible for creating `FileIO`s used for worker nodes to avoid these issues. I don't know if similar concern is valid for Hive. I think it is better if we can allow skipping serialization of the entire FileIO object, add some hooks like `SerializableTable.initializeFileIO(fileIO)`, and let the engine create `FileIO` based on their specific requirements outside the serialization logic. Any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
