RussellSpitzer opened a new pull request #1420: URL: https://github.com/apache/iceberg/pull/1420
This is one of two WIP PR's to demonstrate some approaches to parallelizing the job planning phase of Spark Reads. To add distribtued manifest reading to table scans, we allow for a ManifestProcessor to be used in a TableScan. This class is utilzied when ManifestGroup processes Manifest files. The default implementation mimics the current code by just wrapping the reading code in a CloseableIterable. A distributed Spark implementation is also provided which reads all of the manifests remotely before returning valid entries for further processing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
