[ 
https://issues.apache.org/jira/browse/HUDI-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397509#comment-17397509
 ] 

ASF GitHub Bot commented on HUDI-2119:
--------------------------------------

vinothchandar commented on a change in pull request #3210:
URL: https://github.com/apache/hudi/pull/3210#discussion_r687032405



##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -480,6 +482,131 @@ public void testRollbackUnsyncedCommit(HoodieTableType 
tableType) throws Excepti
       client.syncTableMetadata();
       validateMetadata(client);
     }
+
+    // If an unsynced commit is automatically rolled back during next commit, 
the rollback commit gets a timestamp
+    // greater than than the new commit which is started. Ensure that in this 
case the rollback is not processed
+    // as the earlier failed commit would not have been committed.
+    //
+    //  Dataset:   C1        C2         C3.inflight[failed]   C4   R5[rolls 
back C3]
+    //  Metadata:  C1.delta  C2.delta
+    //
+    // When R5 completes, C3.xxx will be deleted. When C4 completes, C4 and R5 
will be committed to Metadata Table in

Review comment:
       On this one, I continue to disagree :). We can easily do a logger.warn 
or error and collect information in an automated fashion, to fix the issues. In 
the current model, heavy user intervention is needed to fix the metadata table 
and then get the pipeline back online. This is not very user-friendly for folks 
in OSS.
   
   I am thinking of adding an internal config to make this behavior 
configurable, so you can have it failing hard in Uber and we can do the other 
way in OSS. It ll actually help us see more variety of issues and harden the 
implementation. wdyt? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Syncing of rollbacks to metadata table does not work in all cases
> -----------------------------------------------------------------
>
>                 Key: HUDI-2119
>                 URL: https://issues.apache.org/jira/browse/HUDI-2119
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Blocker
>              Labels: pull-request-available, release-blocker
>             Fix For: 0.9.0
>
>
> This is an issue with inline automatic rollbacks.
> Metadata table assumes that a rollbacks is to be applied if the 
> instant-being-rolled back has a timestamp less than the last deltacommit time 
> on the metadata timeline. We do not explicitly check if the 
> instant-being-rolled-back was actually written to metadata table.
> **A rollback adds a record to metadata table which "deletes" files from a 
> failed/earlier commit. If the files being deleted were never actually 
> committed to metadata table earlier, the deletes cannot be consolidated 
> during metadata table reads. This leads to a HoodieMetadataException as we 
> cannot differentiate this from a bug where we might have missed committing a 
> commit to metadata table.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to