vinothchandar commented on issue #3581:
URL: https://github.com/apache/hudi/issues/3581#issuecomment-910387627


   @codejoyan Compression is a key thing to align for ensuring apples-apples 
comparison, glad that got the storage issue under control. 
   
   So the time for approach 2, seems more like 2 mins? (assume the first column 
is submission time). 
   
   To reduce the listing cost, Hudi does have a [metadata table 
](http://hudi.apache.org/docs/configurations#hoodiemetadataenable) that can 
fetch listings without going to cloud storage for listings. We can try this 
out. I think even with file index, listing is fetched/refreshed at-least once.  
   
   when writing,querying the dataset, use  `hoodie.metadata.enable=true`
   
   but the bigger cost is the 1 minute between 0 and 1.  That is puzzling. 
   
   cc @umehrot2 , I know you tested all this out. wondering if you have 
insights.
   cc @nsivabalan as FYI given you are looking into all things metadata table 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to