yihua opened a new pull request, #7488:
URL: https://github.com/apache/hudi/pull/7488

   ### Change Logs
   
   Currently, on the reader or query engine side, the direct file listing on 
the file system is used by default, as indicated by 
`HoodieMetadataConfig.DEFAULT_METADATA_ENABLE_FOR_READERS` (`=false`).  Without 
providing explicit config of `hoodie.metadata.enable`, the metadata-table-based 
file listing is disabled.  However, the `BaseHoodieTableFileIndex`, the common 
File Index implementation, used by Trino Hive connector, does not respect this 
default behavior.  This leads to performance regression of query latency in 
Trino Hive connector, due to way of how the connector is integrated with the 
Input Format and the File Index with metadata table enabled.
   
   This PR fixes the `BaseHoodieTableFileIndex` to respect the expected 
behavior defined by `HoodieMetadataConfig.DEFAULT_METADATA_ENABLE_FOR_READERS`, 
i.e., metadata-table-based file listing is disabled by default.  The 
metadata-table-based file listing is only enabled when `hoodie.metadata.enable` 
is set to true and the files partition of the metadata table is ready for read 
based on the Hudi table config.
   
   ### Impact
   
   This mitigates the performance regression of query latency in Trino Hive 
connector and fixes the read-side behavior of the file listing.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to