rdblue opened a new pull request, #5206:
URL: https://github.com/apache/iceberg/pull/5206

   This improves job planning performance by moving `ManifestFiles.read` setup 
into the `ParallelIterator` that is used to plan tasks. `ParallelIterator` 
accepts an `Iterable` of `CloseableIterable`. The outer iterable is iterated 
over to submit tasks that run in the worker pool. In `ManifestGroup`, the 
`Iterable` that was returned would call `ManifestFiles.read` to prepare the 
inner iterable, but the `ManifestReader` needs to read Avro file metadata and 
will open a stream. That initial file open was running in the consuming thread 
as tasks were submitted, instead of in the worker pool.
   
   This updates `ManifestGroup` to use a custom `Iterable` that defers calling 
`ManifestFiles.read` until the inner iterable is used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to