sivabalan narayanan created HUDI-5436:
-----------------------------------------

             Summary: Auto repair tool for MDT out of sync
                 Key: HUDI-5436
                 URL: https://issues.apache.org/jira/browse/HUDI-5436
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata
            Reporter: sivabalan narayanan


Can we write a spark-submit to repair any out of sync issues w/ MDT. for eg, if 
MDT validation failed for a given table, we don't have a good way to fix the 
MDT.
So, we should develop a sparksubmit job which will try to deduce from which 
commit the out of sync happens and try to fix just the delta.
 
idea here is:
Try running validation job for latest files at every commit starting from 
latest in reverse chronological order. At some point validation will succeed. 
Lets call it commit N.
we can add savepoint to MDT at commit N and restore the table to that commit N.
and then we can take any new commits after commitN from data table and apply 
them one by one to MDT.
 
Once complete, we can run validation tool again to ensure its in good shape.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to