[GitHub] [iceberg] RussellSpitzer opened a new pull request #1420: WIP - Add an api for Parallelizing Manifest Reading in ManifestGroup

GitBox Thu, 03 Sep 2020 14:10:24 -0700


RussellSpitzer opened a new pull request #1420:
URL: https://github.com/apache/iceberg/pull/1420



   This is one of two WIP PR's to demonstrate some approaches to parallelizing 
the job planning phase of Spark Reads.
   
   To add distribtued manifest reading to table scans, we allow for a 
ManifestProcessor
   to be used in a TableScan. This class is utilzied when ManifestGroup 
processes
   Manifest files. The default implementation mimics the current code by just 
wrapping
   the reading code in a CloseableIterable. A distributed Spark implementation 
is also
   provided which reads all of the manifests remotely before returning valid 
entries
   for further processing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer opened a new pull request #1420: WIP - Add an api for Parallelizing Manifest Reading in ManifestGroup

Reply via email to