szehon-ho commented on PR #4596:
URL: https://github.com/apache/iceberg/pull/4596#issuecomment-1136529127

   Took a little a look through the code,
   
   ParallelIterable.hasNext() calls [checkTasks() 
](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java#L97)
 , which then calls submitNextTask() but only up to taskFuture size (configured 
thread pool size).  
   
   
[submitNextTask](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java#L129)
  is the one that calls tasks.next(), which adds things to the queue.  I guess 
that's why @rdblue is mentioning, it is self-limiting.
   
   The problem may be that each 
[tasks.next()](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java#L65)
 does dump the entire nested Iterable into the queue.  
   
   So in real life example of the planning case, it seems the level parallelism 
is manifest.  New manifests may be blocked to be processed if nobody is 
consuming, but a single or few manifests having a lot of entries may cause 
memory pressure as they dump all of the entries.  @lirui-apache @uncleGen is 
that the case?  Probably the fix is to rewriteManifests to have more reasonable 
number of entries?  
   
   I guess if we use blocking queue, it may be useful and we have another knob 
to limit the memory, but then it's not truly parallel by manifest, ie some 
manifests may be blocked a long time before they can submit their entries to 
the queue.  @rdblue let me know if you have any thoughts or if I may have 
misread the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to