yohengyangyang edited a comment on issue #3741:
URL: https://github.com/apache/iceberg/issues/3741#issuecomment-995691825


   > I'm confused about the task.next(), issue. From what I can see task.next 
should be checking the parallel iterable queue which starts by calling hasNext 
which repopulates the queue if it has any empty slots. It populates these slots 
with a runnable of the entry in the iterable.
   > 
   > 
https://github.com/apache/iceberg/blob/f71091539e1fa9e4064cfbc5141fe3e890e1a5f0/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java#L63-L73
   > 
   > Also has anyone checked to see whether just increasing the parallelism of 
the work pool would work?
   
   @RussellSpitzer  What you said is okay, There is no problem with the logic 
of ParallelIterable itself. The problem is that iterables is actually a 
TransformedIterator, which is wrapped by the ManifestGroup#entries method, and 
when call iterables.next, it will first call the following transform logic. The 
following code will go to hdfs to read the arvo file, sometimes it is very 
slow, thus blocking the submission
   
   
https://github.com/apache/iceberg/blob/f71091539e1fa9e4064cfbc5141fe3e890e1a5f0/core/src/main/java/org/apache/iceberg/ManifestGroup.java#L239-L264


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to