sivabalan narayanan created HUDI-5436:
-----------------------------------------
Summary: Auto repair tool for MDT out of sync
Key: HUDI-5436
URL: https://issues.apache.org/jira/browse/HUDI-5436
Project: Apache Hudi
Issue Type: Bug
Components: metadata
Reporter: sivabalan narayanan
Can we write a spark-submit to repair any out of sync issues w/ MDT. for eg, if
MDT validation failed for a given table, we don't have a good way to fix the
MDT.
So, we should develop a sparksubmit job which will try to deduce from which
commit the out of sync happens and try to fix just the delta.
idea here is:
Try running validation job for latest files at every commit starting from
latest in reverse chronological order. At some point validation will succeed.
Lets call it commit N.
we can add savepoint to MDT at commit N and restore the table to that commit N.
and then we can take any new commits after commitN from data table and apply
them one by one to MDT.
Once complete, we can run validation tool again to ensure its in good shape.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)