yihua opened a new pull request, #7436:
URL: https://github.com/apache/hudi/pull/7436

   ### Change Logs
   
   As of now, we only cache the log file reader inside 
`HoodieBackedTableMetadata`.  Each time the metadata table is looked up with 
`getRecordByKey` or `getRecordsByKeyPrefixes` in `HoodieBackedTableMetadata`, 
the corresponding MT partition is listed through 
`HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices` because a file 
system view is constructed each time.  This causes repeated FS list calls on MT 
partitions which can be avoided and thus increases the latency for reading 
metadata table and listing files for data table, affecting Presto query latency 
for example (sample S3 access log from Presto below for listing `files` 
partition in MT).
   
   ```
   2022-11-24T22:06:43.009Z     INFO    hive-hive-2     
org.apache.hudi.common.table.view.AbstractTableFileSystemView   Building file 
system view for partition (files)
   2022-11-24T22:06:43.009Z     DEBUG   hive-hive-2     com.amazonaws.request   
Sending Request: GET https://<redacted>.s3.us-east-2.amazonaws.com / 
Parameters: 
({"prefix":["<redacted>/store_sales/.hoodie/metadata/files/"],"delimiter":["/"],"encoding-type":["url"]}Headers:
 (amz-sdk-invocation-id: 9e963ae0-f2e4-738e-691f-073c5a43264d, Content-Type: 
application/octet-stream, User-Agent: , aws-sdk-java/1.11.697 
Linux/5.4.219-126.411.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/25.342-b07 
java/1.8.0_342 vendor/Oracle_Corporation, presto, ) 
   2022-11-24T22:06:43.022Z     DEBUG   hive-hive-2     com.amazonaws.request   
Received successful response: 200, AWS Request ID: Y4KHZHYVG7SSB0J4
   ```
   
   This PR makes the changes to cache the file system view of the metadata 
table and, thus the latest file slices at the partition level for metadata 
table inside `HoodieBackedTableMetadata`.
   
   ### Impact
   
   This PR avoids repeated file listing on the metadata table and thus reduces 
the latency for reading metadata table.  This reduces the latency of the 
overall metadata-table-based file listing and thus improves the query 
performance.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to