hj2016 commented on a change in pull request #2188:
URL: https://github.com/apache/hudi/pull/2188#discussion_r551674028



##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/hbase/SparkHoodieHBaseIndex.java
##########
@@ -480,6 +486,68 @@ private Integer getNumRegionServersAliveForTable() {
   @Override
   public boolean rollbackCommit(String instantTime) {
     // Rollback in HbaseIndex is managed via method {@link 
#checkIfValidCommit()}
+    synchronized (SparkHoodieHBaseIndex.class) {

Review comment:
       There will be problems with the hbase index. The scenario that needs to 
be rolled back is that the hbase partition change is turned on and an error is 
reported after the hbase index is written for some reasons (some reasons may be 
due to jvm memory overflow, hbase suddenly crashes), for example, At the 
beginning, the data was id:1 partition:2019, and then another commit failed and 
the index was written to hbase. At this time, the index partition was changed 
to 2020. So the next time the data is written, it will only be written to In 
the 2020 partition, resulting in data duplication. After judging based on the 
rollbackSync parameter, the following logic will not be executed. If you set 
hbase.index.rollback.sync = false, hoodie.hbase.index.update.partition.path = 
true, there will still be problems. I think it would be more reasonable to 
write like this:
    
   if (!config.getHbaseIndexUpdatePartitionPath()){
    return true;
   }
   synchronized (SparkHoodieHBaseIndex.class) {
    ....
   }
   return true;
    
   Because only when the partition changes, problems may occur.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to