VitoMakarevich opened a new pull request, #10151:
URL: https://github.com/apache/hudi/pull/10151

   ### Change Logs
   
   This is a fix for the issue 
[10088](https://github.com/apache/hudi/issues/10088)/[Hudi-7034](https://issues.apache.org/jira/browse/HUDI-7034).
   The issue is reproducible - the reproduction 
[here](https://github.com/VitoMakarevich/hudi-incremental-issue/blob/master/src/main/scala/com/example/hudi/HudiRefreshBug.scala).
   The issue impact is that once loaded partition file slices are never 
changing(filenames). Therefore - partitions are not receiving updates and later 
we start receiving `FileNotFoundException` - due to the fact that some files to 
which index is stuck, were cleaned.
   The way I fixed it is basically invalidated list of files in partitions, so 
the subsequent call to `getAllQueryPartitionPaths` will refresh the list of 
partitions and then this code will not see any cached file for any partition 
and refresh the list of files
   ```
       List<PartitionPath> missingPartitions = partitionPaths.stream()
           .filter(p -> !cachedAllInputFileSlices.containsKey(p))
           .collect(Collectors.toList());
   ```
   
   There is no test created for this yet, I will try to add a unit test as well 
as direct `refresh view` test.
   
   ### Impact
   
   There's no user-facing change, but I don't know all code-paths this code is 
used.
   
   ### Risk level (write none, low medium or high below)
   
   Hard to assess
   
   ### Documentation Update
   
    No need.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to