singhpk234 commented on PR #13400:
URL: https://github.com/apache/iceberg/pull/13400#issuecomment-3408871500

   Thank you so much for hands on POC Amogh (i am reading some of these 
interface first time) , It clean and nicely abstracted !
   Here is my understanding, please let me know what your thoughts are  : 
   1. 
[RecursiveTask](https://github.com/amogh-jahagirdar/iceberg/blob/separate-pool-with-queues/core/src/main/java/org/apache/iceberg/RESTTableScan.java#L261)
 is a really nice abstraction to incorporate this nested tree traversal, 
specially in the BFS manner 
    - we call the `invokeAll()` immediately 
([here](https://github.com/amogh-jahagirdar/iceberg/blob/separate-pool-with-queues/core/src/main/java/org/apache/iceberg/RESTTableScan.java#L307))
 since this is a blocking call this would mean ForkJoinPool will hit plan 
endpoint for all the queries concurrently, this might lead to unnecessary call 
when things like LIMIT is posed ?
    2 
[FlatteningTaskIterable](https://github.com/amogh-jahagirdar/iceberg/blob/separate-pool-with-queues/core/src/main/java/org/apache/iceberg/RESTTableScan.java#L325)
 when we are offering to the queue, if the queue has already reached to the 
capacity, essentially `queue.offer()` returning false, we would have to wait 
and hence essentially invokeAll waits ?
    
   Presently i am doing a DFS and the ScanIterable i am using is not great, it 
tracks a List<FileScanTask> in itself and ParallelIterable contains at one 
point both the Leaf and non-Leaf nodes which are hard to reason about :( 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to