Alexey Kudinkin created HUDI-3717:
-------------------------------------
Summary: Avoid double-listing w/in BaseHoodieTableFileIndex
Key: HUDI-3717
URL: https://issues.apache.org/jira/browse/HUDI-3717
Project: Apache Hudi
Issue Type: Bug
Reporter: Alexey Kudinkin
Attachments: Screen Shot 2022-03-25 at 7.05.09 PM.png, Screen Shot
2022-03-25 at 7.05.43 PM.png
Currently in `BaseHoodieTableFileIndex::loadPartitionPathFiles` essentially
does file-listing twice:
* Once when `getAllQueryPartitionPaths` is invoked
* Second time when `getFilesInPartitions` is invoked
While this will not result in double-listing of the files on FS (b/c of
`FIleStatusCache`, if any), this leads however to MT being queried twice:
!Screen Shot 2022-03-25 at 7.05.09 PM.png!
!Screen Shot 2022-03-25 at 7.05.09 PM.png!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)