[GitHub] [hudi] codejoyan commented on issue #3581: [SUPPORT] Slow snapshot query performance

GitBox Sat, 18 Sep 2021 07:17:52 -0700


codejoyan commented on issue #3581:
URL: https://github.com/apache/hudi/issues/3581#issuecomment-922282821



   Thanks for the response @umehrot2 and @xushiyan . I am using 0.9.0 bit still 
observing `Listing leaf files and directories` even after making the changes 
you suggested. Below are the code snippet and Spark UI details:
   
   ```
   scala> import org.apache.hudi.DataSourceReadOptions
   import org.apache.hudi.DataSourceReadOptions
   
   scala> spark.sql("SET hoodie.metadata.enable=true")
   res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
   
   scala> spark.sql("SET hoodie.metadata.metrics.enable=true")
   res1: org.apache.spark.sql.DataFrame = [key: string, value: string]
   
   scala> 
spark.time(spark.read.format("hudi").option("hoodie.file.index.enable", 
true).load("gs://udp-hudi-storage3/store_visit_scan_hudi_spark_3_tgt_v3/*/*/*"))
   21/09/18 14:13:24 WARN 
org.apache.spark.sql.execution.datasources.SharedInMemoryCache: Evicting cached 
table partition metadata from memory due to size constraints 
(spark.sql.hive.filesourcePartitionFileCacheSize = 262144000 bytes). This may 
impact query planning performance.
   Time taken: 144686 ms
   res2: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, 
_hoodie_commit_seqno: string ... 124 more fields]
   ```
   Corresponding stage details on Spark UI for the above command:
   <img width="1715" alt="Screenshot 2021-09-18 at 7 46 45 PM" 
src="https://user-images.githubusercontent.com/48707638/133891815-d0a025a0-a51d-46d0-9589-306082287e35.png";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] codejoyan commented on issue #3581: [SUPPORT] Slow snapshot query performance

Reply via email to