Prashanth,
My concern was we should not be losing metadata about clean operation.
But there is a way, As long as we are faithfully copying the clean metadata
that tracks the files which got cleaned and storing in restore metadata, we
should be able to keep metadata in sync.
Balaji.V
On Wednesday, March 18, 2020, 11:54:11 AM PDT, Prashant Wason
<[email protected]> wrote:
Thanks for the info Vinoth / Balaji.
To me it feels a split between easier-to-understand design and
current-implementation. I feel it is simpler to reason (based on how file
systems work in general) that restoreToInstant is a complete point-in-time
shift to the past (like restoring a file system from a snapshot/backup).
If I have restored the Table to commitTime=005, then having any instants
with commitTime > 005 are confusing as it implies that even though my table
is at an older time, some future operations will be applied onto it at some
point.
I will have to read more about incremental timeline syncing and timeline
server to understand how it uses the clean instants. BTW, the comment on
the function HoodieWriteClient::restoreToInstant reads "NOTE : This action
requires all writers (ingest and compact) to a table to be stopped before
proceeding". So probably the embedded timeline server can recreate the view
next time it comes back up?
Thanks
Prashant
On Wed, Mar 18, 2020 at 11:37 AM Balaji Varadarajan
<[email protected]> wrote:
> Prashanth,
> I think we should not be reverting clean operations here. Cleans are done
> on the oldest file slices and a restore/rollback is not completely undoing
> the work of clean that happened before it.
> For incremental timeline syncing, embedded timeline server needs to read
> these clean metadata to sync its cached file-system view.
> Let me know your thoughts.
> Balaji.V
> On Wednesday, March 18, 2020, 11:23:09 AM PDT, Prashant Wason
> <[email protected]> wrote:
>
> HI Team,
>
> I noticed that when a table is restored to a previous commit (
> HoodieWriteClient::restoreToInstant
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dhudi_blob_master_hudi-2Dclient_src_main_java_org_apache_hudi_client_HoodieWriteClient.java-23L735&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=ASTWkm7UUMnhZ7sBzpXGPkTc1PhNTJeO7q5IXlBCprY&s=43rqua7SdhvO91hA0ZhOPNQw8ON1nL3bAsCue5o8aYw&e=
> >),
> only the COMMIT, DELTA_COMMIT and COMPACTION instants are rolled back and
> their corresponding files are deleted from the timeline. If there are some
> CLEAN instants, they are left over.
>
> Is there a reason why CLEAN are not removed? Won't they be referring to
> files which are no longer present and hence not useful?
>
> Thanks
> Prashant
>