yihua opened a new pull request, #7744:
URL: https://github.com/apache/hudi/pull/7744

   ### Change Logs
   
   When the metadata table is enabled and used for getting the partition paths 
under certain directories (listing by partition path prefix in 
`SparkHoodieTableFileIndex` and getting query partition paths in 
`BaseHoodieTableFileIndex`), the following logic in `HoodieBackedTableMetadata` 
is invoked
   
   ```
     public List<String> getPartitionPathsWithPrefixes(List<String> prefixes) 
throws IOException {
       return getAllPartitionPaths().stream()
           .filter(p -> prefixes.stream().anyMatch(p::startsWith))
           .collect(Collectors.toList());
     }
   ```
   
   If all the partition paths contain `1`, `10`, and `100`, listing using `1` 
returns all three, which is incorrect.  The `prefixes` should serve as the 
exact relative path and the naming itself is misleading.
   
   This PR made the following change to correct the issue:
   - Renames `getPartitionPathsWithPrefixes` to `getPartitionPathsInDirs` in 
`HoodieTableMetadata` and update relevant variable naming and docs
   - Fixes the logic in `HoodieBackedTableMetadata#getPartitionPathsInDirs` to 
do exact parent directory matching
   - Adds new tests in `TestHoodieFileIndex` to guard around the logic.  These 
tests fail before this PR.
   
   This PR fixes #7298.
   
   ### Impact
   
   Fixes the bug of getting irrelevant partition paths given a relative 
path/prefix.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to