pvary commented on pull request #2046: URL: https://github.com/apache/iceberg/pull/2046#issuecomment-757929023
> @pvary I skimmed this PR, seems I need more background to understand this change. Let me see the previous committed PRs. Thanks @openinx for taking the time to check the PR! Feel free to ask any questions here/or on slack/or in email if you feel it is easier than digging up everything, I would be happy to answer them! I would like to give some context - hope this helps: With Hive, and maybe even for other execution engines too, the query compilation and the query execution happens on different nodes and we are only sending serialized data between the them. The execution also could happen in a distributed mode and it is unnecessary (and even problematic) for every executor node to look-up the table data from the Catalogs. If during the compilation we read the table data from the Catalog and then serialize, then the executor nodes do not have to have access to the Catalog, and it could be enough for them to have S3 access to read the snapshot data themselves. In nutshell what we are trying to archive here to have a way to Serialize/Deserialize not only BaseTable-s, but every MetadataTable as well. Thanks, Peter ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
