codejoyan edited a comment on issue #3581:
URL: https://github.com/apache/hudi/issues/3581#issuecomment-922282821
Thanks for the response @umehrot2 and @xushiyan . I am using 0.9.0 but still
observing `Listing leaf files and directories` even after making the changes
you suggested. Below are the code snippet and Spark UI details:
```
scala> import org.apache.hudi.DataSourceReadOptions
import org.apache.hudi.DataSourceReadOptions
scala> spark.sql("SET hoodie.metadata.enable=true")
res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
scala> spark.sql("SET hoodie.metadata.metrics.enable=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]
scala>
spark.time(spark.read.format("hudi").option("hoodie.file.index.enable",
"true").load("gs://udp-hudi-storage3/store_visit_scan_hudi_spark_3_tgt_v3/*/*/*"))
21/09/18 14:13:24 WARN
org.apache.spark.sql.execution.datasources.SharedInMemoryCache: Evicting cached
table partition metadata from memory due to size constraints
(spark.sql.hive.filesourcePartitionFileCacheSize = 262144000 bytes). This may
impact query planning performance.
Time taken: 144686 ms
res2: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string,
_hoodie_commit_seqno: string ... 124 more fields]
```
Corresponding stage details on Spark UI for the above command:
<img width="1715" alt="Screenshot 2021-09-18 at 7 46 45 PM"
src="https://user-images.githubusercontent.com/48707638/133891815-d0a025a0-a51d-46d0-9589-306082287e35.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]