namuny opened a new issue, #5776:
URL: https://github.com/apache/hudi/issues/5776

   I'm noticing a steep increase in duration for listing partitions during 
clustering, specifically after [this 
PR](https://github.com/apache/hudi/pull/4643) was merged. I'm yet to get to the 
bottom of exactly why, but reverting the implementation of 
FileSystemBackedTableMetadata.getAllPartitionPaths to 0.9.0's implementation 
gives me a performance boost.
   
   **Test results**:
   * 0.9.0 approach (but using 0.11.0 for everything else) - 50 seconds to list 
partitions
   * Pure 0.11.0 approach - over 20 minutes to list partitions
   
   **My setup**:
   * Hudi 0.11.0
   * CoW + inline clustering
   * Metadata table is disabled
   * Test results above is with 10,000 partitions, using S3.
   
   Regardless of why the metadata is disabled, I'm curious to understand why 
the partition listing time for 10,000 partitions goes from sub minute to 20+ 
minutes.
   
   
   **Expected behavior**
   
   There should not be a performance degradation when listing partitions for 
operations such as clustering.
   
   **Environment Description**
   
   * Hudi version : 0.11.0
   
   * Spark version : 3.1.2
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to