yihua opened a new pull request, #7488: URL: https://github.com/apache/hudi/pull/7488
### Change Logs Currently, on the reader or query engine side, the direct file listing on the file system is used by default, as indicated by `HoodieMetadataConfig.DEFAULT_METADATA_ENABLE_FOR_READERS` (`=false`). Without providing explicit config of `hoodie.metadata.enable`, the metadata-table-based file listing is disabled. However, the `BaseHoodieTableFileIndex`, the common File Index implementation, used by Trino Hive connector, does not respect this default behavior. This leads to performance regression of query latency in Trino Hive connector, due to way of how the connector is integrated with the Input Format and the File Index with metadata table enabled. This PR fixes the `BaseHoodieTableFileIndex` to respect the expected behavior defined by `HoodieMetadataConfig.DEFAULT_METADATA_ENABLE_FOR_READERS`, i.e., metadata-table-based file listing is disabled by default. The metadata-table-based file listing is only enabled when `hoodie.metadata.enable` is set to true and the files partition of the metadata table is ready for read based on the Hudi table config. ### Impact This mitigates the performance regression of query latency in Trino Hive connector and fixes the read-side behavior of the file listing. ### Risk level low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
