yihua opened a new pull request, #7436:
URL: https://github.com/apache/hudi/pull/7436
### Change Logs
As of now, we only cache the log file reader inside
`HoodieBackedTableMetadata`. Each time the metadata table is looked up with
`getRecordByKey` or `getRecordsByKeyPrefixes` in `HoodieBackedTableMetadata`,
the corresponding MT partition is listed through
`HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices` because a file
system view is constructed each time. This causes repeated FS list calls on MT
partitions which can be avoided and thus increases the latency for reading
metadata table and listing files for data table, affecting Presto query latency
for example (sample S3 access log from Presto below for listing `files`
partition in MT).
```
2022-11-24T22:06:43.009Z INFO hive-hive-2
org.apache.hudi.common.table.view.AbstractTableFileSystemView Building file
system view for partition (files)
2022-11-24T22:06:43.009Z DEBUG hive-hive-2 com.amazonaws.request
Sending Request: GET https://<redacted>.s3.us-east-2.amazonaws.com /
Parameters:
({"prefix":["<redacted>/store_sales/.hoodie/metadata/files/"],"delimiter":["/"],"encoding-type":["url"]}Headers:
(amz-sdk-invocation-id: 9e963ae0-f2e4-738e-691f-073c5a43264d, Content-Type:
application/octet-stream, User-Agent: , aws-sdk-java/1.11.697
Linux/5.4.219-126.411.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/25.342-b07
java/1.8.0_342 vendor/Oracle_Corporation, presto, )
2022-11-24T22:06:43.022Z DEBUG hive-hive-2 com.amazonaws.request
Received successful response: 200, AWS Request ID: Y4KHZHYVG7SSB0J4
```
This PR makes the changes to cache the file system view of the metadata
table and, thus the latest file slices at the partition level for metadata
table inside `HoodieBackedTableMetadata`.
### Impact
This PR avoids repeated file listing on the metadata table and thus reduces
the latency for reading metadata table. This reduces the latency of the
overall metadata-table-based file listing and thus improves the query
performance.
### Risk level
low
### Documentation Update
N/A
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]