[GitHub] [iceberg] RussellSpitzer opened a new pull request #1421: Add a Parallelized Spark Job Planning Path

GitBox Mon, 26 Oct 2020 07:11:47 -0700


RussellSpitzer opened a new pull request #1421:
URL: https://github.com/apache/iceberg/pull/1421



   This is the second of two WIPs for parallelizing Spark Read Job Planning
   
   The other is located at https://github.com/apache/iceberg/pull/1420
   
   To parallelize the creation of TableScanTasks, we use the
   metadata tables to get a listing of DataFiles and do filtering in
   spark before starting the scan job. Once the correct datafiles are
   identified, scan tasks are created and returned.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer opened a new pull request #1421: Add a Parallelized Spark Job Planning Path

Reply via email to