[GitHub] [iceberg] jackye1995 commented on pull request #3752: Core: add config for disabling FileIO hadoop configuration serialization

GitBox Fri, 17 Dec 2021 13:05:45 -0800


jackye1995 commented on pull request #3752:
URL: https://github.com/apache/iceberg/pull/3752#issuecomment-997030992



   Sounds like what is needed here is not the ability to skip serialization of 
the config, but to add the ability to set the entire `FileIO` for a 
`SerializableTable` at worker node, because Hadoop configuration might not be 
the only thing that could take a lot of serialization space, and it seems like 
we are adding a special handling just for that.
   
   There is also some security concerns in this approach when you say
   > also because executors might not be authorized to communicate with the 
metastore to load the table
   
   In Trino we have a singleton `FileIOProvider`, which is responsible for 
creating `FileIO`s used for worker nodes to avoid these issues. I don't know if 
similar concern is valid for Hive.
   
   I think it is better if we can allow skipping serialization of the entire 
FileIO object, add some hooks like 
`SerializableTable.initializeFileIO(fileIO)`, and let the engine create 
`FileIO` based on their specific requirements outside the serialization logic.
   
   Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 commented on pull request #3752: Core: add config for disabling FileIO hadoop configuration serialization

Reply via email to