kbuci commented on issue #17866:
URL: https://github.com/apache/hudi/issues/17866#issuecomment-3782782416

   > Can you confirm we are only interested in latest file slices and not older 
ones. But when we restore back, we may not be able to do timetravel queries. 
Only snapshot will be feasible. Just wanted to clarify on the requirements. 
   
   Yes we only want latest file slices. We should make it so that timetravel 
queries will return no data or 0 rows for that partition. But we should still 
be able to do timetravel queries on other partitions (where stash was never 
attempted). Is that feasible?
   
   > btw, in this case, I assume stashing will be synchronous right. i.e. the 
partition can never be marked as deleted for consumers until the stashing 
completes successfully. 
   
   That's right, there should be a synchronous API call that only returns 
success if partition was stashed. Since we don't want a case where partition 
was deleted before it was stashed (due to a transient failure in the middle)
   
   > What relation does this have wrt "delete_partition" operation we already 
have. iIs it a add on to "delete_partition" operation, where in instead of 
nuking the contents of the partition (which is the default behavior w/ delete 
partition), here we move the contents to a new folder, but still continue to 
mark the partition as unavailable for data table consumers?
   
   Based on my understanding of this existing OSS API, the two differences are
   - in OSS API the files will only be deleted later, once clean runs and the 
delete partition instant is no longer in the clean window (last N commits to 
retain, etc).
   - both APIs make sure the data/partition is immediately not queryable. But 
the stash API will also move the data files elsewhere as you mentioned (if the 
operation succeeds). 
   
   > What incase there are concurrent writes going into the partition of 
interest when "stashPartitions" operation is invoked?
   It's fine if concurrent writes (serialized after the stash op) re-create the 
partition. Also if there are inflight data files, we would expect them to be 
"lost" (due to not being part of the latest snapshot). We are fine with this 
for our use case.
   
   > Incase of MOR table, this could also mean, we back up log files as well 
and not just base files. Is my understanding right?
   
   Yes, but this raised a good point, we need to be careful with MOR 
implementation. Specifically, for a scenario where the partition is part of an 
inflight plan, but a stash operation is called. I think to be on the safe side 
we can initially block stash operation if a partition has any data files part 
of an inflight compaction plan
   
   > Can we do insert_overwrite operation in this case. Do note that, commit 
times for the data might differ if we take this route 
   after restoring. But this might be cleaner. If not, we might need to do 
special handling of updates to metadata table writes. With streaming writer 
support in 1.x, might be challenging as well.
   
   We are fine if the commit times are different in restored partition compared 
to when the partition was stashed. But just to clarify, do you mean reading all 
records from files in the stashed folder and using existing insert_overwrite to 
just insert those back in?
   
   > Requirements:
   can you throw some more light on this requirement
   failures and rollbacks If the operation fails after creating a plan, then it 
should be eventually rolled back by a rollback call (as part of clean’s 
rollback of failed writes). The rollback implementation should consist of 
“undoing” all the DFS operations: after rollback is completed, any partitions 
that were attempted to be stashed should still have their (latest) data files 
and any partitions attempted to be restored should still remain empty.
   
   Sure, the idea here is that if the stash/restore operations fail, then we 
should be able to rely on `clean` to roll them back. Since the stash/restore 
attempt might not be re-tried
   
   > Wanted to brainstorm on some idea towards the requirement:
   
   Thanks for sharing. One constraint we have is that we don't know how long 
exactly we need to keep the stashed partition around for a future restore. So 
we cannot setup an automatic "delete if user doesn't ask for a restore within N 
hours" TTL.
   
   If we want to split the stash/restore API each into multiple operations 
though, there was another approach/design I had briefly brainstormed. The idea 
is that we add new HUDI operation to "unregister" a partition, where the DFS 
partition will still stay in the dataset as-is, but its files/records will no 
longer be part of the HUDI dataset (and won't be queryable or in MDT). And 
similar a "register" operation that will make all data in the DFS partition 
"appear" in the HUDI dataset. The idea is that for stashing, we can 
"unregister" the partition,  then use DFS operations to rename/copy the folder 
to the stash folder (and remove files not part of the latest snapshot). And to 
recover, we can again use DFS operations to make sure the DFS partition is 
added back to same location before calling "register". The advantage of this 
approach is that the implementation of "moving" files back and forth from the 
stash location doesn't need to use HUDI APIs. And it makes it easer to reason 
abou
 t rollbacks. The drawback though is that we may have to consider how to handle 
cases where other operations attempt to write to an "unregistered" partition - 
since ideally they should fail.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to