umehrot2 commented on issue #3581:
URL: https://github.com/apache/hudi/issues/3581#issuecomment-922046864


   @codejoyan since you are observing `Listing leaf files...` it means that 
your code is using `InMemoryFileIndex` instead of `HoodieFileIndex`. I think 
you are using an older version of Hudi and not Hudi 0.9.0 for your testing. In 
Hudi 0.9.0 to enable metadata listing you can just do `SET 
hoodie.metadata.enable=true` in Spark SQL.
   
   If you are using earlier version of Hudi i.e 0.8.0 or 0.7.0 it does have 
`HoodieFileIndex`. To obtain best listing performance you should use the Hoodie 
RO Path Filter (if using COW table) 
https://hudi.apache.org/docs/querying_data/#spark-sql. To further enable 
metadata listing in release 0.8.0 or 0.7.0 (either COW or MOR) you need to pass 
it as a hadoop conf: `spark.hadoop.hoodie.metadata.enable`. But main benefits 
of metadata listing you would observe only since Hudi 0.9.0 with the 
introduction of HoodieFileIndex.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to