rdblue opened a new pull request, #5206: URL: https://github.com/apache/iceberg/pull/5206
This improves job planning performance by moving `ManifestFiles.read` setup into the `ParallelIterator` that is used to plan tasks. `ParallelIterator` accepts an `Iterable` of `CloseableIterable`. The outer iterable is iterated over to submit tasks that run in the worker pool. In `ManifestGroup`, the `Iterable` that was returned would call `ManifestFiles.read` to prepare the inner iterable, but the `ManifestReader` needs to read Avro file metadata and will open a stream. That initial file open was running in the consuming thread as tasks were submitted, instead of in the worker pool. This updates `ManifestGroup` to use a custom `Iterable` that defers calling `ManifestFiles.read` until the inner iterable is used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
