pvary commented on pull request #2046:
URL: https://github.com/apache/iceberg/pull/2046#issuecomment-757929023


   > @pvary I skimmed this PR, seems I need more background to understand this 
change. Let me see the previous committed PRs.
   
   Thanks @openinx for taking the time to check the PR!
   Feel free to ask any questions here/or on slack/or in email if you feel it 
is easier than digging up everything, I would be happy to answer them!
   
   I would like to give some context - hope this helps:
   With Hive, and maybe even for other execution engines too, the query 
compilation and the query execution happens on different nodes and we are only 
sending serialized data between the them. The execution also could happen in a 
distributed mode and it is unnecessary (and even problematic) for every 
executor node to look-up the table data from the Catalogs. If during the 
compilation we read the table data from the Catalog and then serialize, then 
the executor nodes do not have to have access to the Catalog, and it could be 
enough for them to have S3 access to read the snapshot data themselves.
   
   In nutshell what we are trying to archive here to have a way to 
Serialize/Deserialize not only BaseTable-s, but every MetadataTable as well.
   
   Thanks,
   Peter


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to