szehon-ho commented on PR #4596: URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1136529127
Took a little a look through the code, ParallelIterable.hasNext() calls [checkTasks() ](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java#L97) , which then calls submitNextTask() but only up to taskFuture size (configured thread pool size). [submitNextTask](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java#L129) is the one that calls tasks.next(), which adds things to the queue. I guess that's why @rdblue is mentioning, it is self-limiting. The problem may be that each [tasks.next()](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java#L65) does dump the entire nested Iterable into the queue. So in real life example of the planning case, it seems the level parallelism is manifest. New manifests may be blocked to be processed if nobody is consuming, but a single or few manifests having a lot of entries may cause memory pressure as they dump all of the entries. @lirui-apache @uncleGen is that the case? Probably the fix is to rewriteManifests to have more reasonable number of entries? I guess if we use blocking queue, it may be useful and we have another knob to limit the memory, but then it's not truly parallel by manifest, ie some manifests may be blocked a long time before they can submit their entries to the queue. @rdblue let me know if you have any thoughts or if I may have misread the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
