ajantha-bhat commented on PR #4674:
URL: https://github.com/apache/iceberg/pull/4674#issuecomment-1118111005

   @szehon-ho :
   
   > @RussellSpitzer pointed me to this, I had a pr is orthogonal to this, to 
avoid duplicate computation of all_reachable_files here 
https://github.com/apache/iceberg/pull/3457 To me that was the bigger time 
consumer (exploring all reachable files), though maybe I need to re-do that pr. 
Wasn't sure how much bottleneck getting all_manifests was.
   
   yeah, scanning the all_manifest table twice was the major problem for me. 
   
   
   > Anyway, agree with @RussellSpitzer that maybe cache is a better option 
than persist? It'd be great to see some numbers for tables with huge snapshots 
for these two options vs today, if possible. I think if , if we go with this 
approach, it should probably be 1) configurable , 2) able to be GC'ed sooner 
than later.
   
   Sure,  I will make it configurable option to cache or not and get the 
performance report locally with large number of snapshots.  I will work on this 
over this weekend. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to