satishkotha edited a comment on issue #4109:
URL: https://github.com/apache/hudi/issues/4109#issuecomment-978407363


   CocurrentModificationException seems to be coming from here 
   
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTablePreCommitFileSystemView.java#L83
   
   We need to redo this logic to avoid newFilesWrittenForPartition.remove(...). 
 
   
   Simple option to try out: 
   Replace line 72 with
   ```
    Map<String, HoodieBaseFile> newFilesWrittenForPartition  = new 
ConcurrentHashMap(filesWritten.stream()
           .filter(file -> partitionStr.equals(file.getPartitionPath()))
           .collect(Collectors.toMap(HoodieWriteStat::getFileId, writeStat -> 
               new HoodieBaseFile(new Path(tableMetaClient.getBasePath(), 
writeStat.getPath()).toString()))))
   ```
   
   Above is more a short-term workaround.  Probably better option is to avoid 
modifying the Map in first place. This can be done by grouping based on fileId 
i.e., replace line 78 -88 with:
   
   ```
   Map<String, HoodieBaseFile> baseFilesForCommittedFileIds = committedBaseFiles
           // Remove files replaced by current inflight commit
           .filter(baseFile -> 
!replacedFileIdsForPartition.contains(baseFile.getFileId()))
           collect(Collectors.toMap(HoodieBaseFile::getFileId, baseFile -> 
baseFile))
   
   baseFilesForCommittedFileIds.putAll(newFilesWrittenForPartition)
   return baseFilesForCommittedFileIds.values().stream();
   ```
   This needs some more testing. I can send PR next week.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to