VitoMakarevich opened a new pull request, #10151: URL: https://github.com/apache/hudi/pull/10151
### Change Logs This is a fix for the issue [10088](https://github.com/apache/hudi/issues/10088)/[Hudi-7034](https://issues.apache.org/jira/browse/HUDI-7034). The issue is reproducible - the reproduction [here](https://github.com/VitoMakarevich/hudi-incremental-issue/blob/master/src/main/scala/com/example/hudi/HudiRefreshBug.scala). The issue impact is that once loaded partition file slices are never changing(filenames). Therefore - partitions are not receiving updates and later we start receiving `FileNotFoundException` - due to the fact that some files to which index is stuck, were cleaned. The way I fixed it is basically invalidated list of files in partitions, so the subsequent call to `getAllQueryPartitionPaths` will refresh the list of partitions and then this code will not see any cached file for any partition and refresh the list of files ``` List<PartitionPath> missingPartitions = partitionPaths.stream() .filter(p -> !cachedAllInputFileSlices.containsKey(p)) .collect(Collectors.toList()); ``` There is no test created for this yet, I will try to add a unit test as well as direct `refresh view` test. ### Impact There's no user-facing change, but I don't know all code-paths this code is used. ### Risk level (write none, low medium or high below) Hard to assess ### Documentation Update No need. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
