pvary opened a new pull request #1920: URL: https://github.com/apache/iceberg/pull/1920
Before this change the split generation loads the table and uses that to generate the scan tasks. This could be problematic: 1. Split generation happens on TezAM - currently we do not have any connection between the TezAMs and the HMS. This could cause extra load and needs extra network configuration/traffic 2. Split generation happens after the query planning and the Table could have changed in the meantime. In the longer term we have to find a way to use the same snapshot throughout the planning and the execution process As a first step, this PR creates `StaticTable` which is a specific snapshot of the Table, and serializes the data required for the creation of this table to the job configuration. This solves 1. and provides a way forward to solve 2. Since all of the InputFormat tests are using the same codepath, no extra tests are added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
