kbuci commented on issue #17866: URL: https://github.com/apache/hudi/issues/17866#issuecomment-3782782416
> Can you confirm we are only interested in latest file slices and not older ones. But when we restore back, we may not be able to do timetravel queries. Only snapshot will be feasible. Just wanted to clarify on the requirements. Yes we only want latest file slices. We should make it so that timetravel queries will return no data or 0 rows for that partition. But we should still be able to do timetravel queries on other partitions (where stash was never attempted). Is that feasible? > btw, in this case, I assume stashing will be synchronous right. i.e. the partition can never be marked as deleted for consumers until the stashing completes successfully. That's right, there should be a synchronous API call that only returns success if partition was stashed. Since we don't want a case where partition was deleted before it was stashed (due to a transient failure in the middle) > What relation does this have wrt "delete_partition" operation we already have. iIs it a add on to "delete_partition" operation, where in instead of nuking the contents of the partition (which is the default behavior w/ delete partition), here we move the contents to a new folder, but still continue to mark the partition as unavailable for data table consumers? Based on my understanding of this existing OSS API, the two differences are - in OSS API the files will only be deleted later, once clean runs and the delete partition instant is no longer in the clean window (last N commits to retain, etc). - both APIs make sure the data/partition is immediately not queryable. But the stash API will also move the data files elsewhere as you mentioned (if the operation succeeds). > What incase there are concurrent writes going into the partition of interest when "stashPartitions" operation is invoked? It's fine if concurrent writes (serialized after the stash op) re-create the partition. Also if there are inflight data files, we would expect them to be "lost" (due to not being part of the latest snapshot). We are fine with this for our use case. > Incase of MOR table, this could also mean, we back up log files as well and not just base files. Is my understanding right? Yes, but this raised a good point, we need to be careful with MOR implementation. Specifically, for a scenario where the partition is part of an inflight plan, but a stash operation is called. I think to be on the safe side we can initially block stash operation if a partition has any data files part of an inflight compaction plan > Can we do insert_overwrite operation in this case. Do note that, commit times for the data might differ if we take this route after restoring. But this might be cleaner. If not, we might need to do special handling of updates to metadata table writes. With streaming writer support in 1.x, might be challenging as well. We are fine if the commit times are different in restored partition compared to when the partition was stashed. But just to clarify, do you mean reading all records from files in the stashed folder and using existing insert_overwrite to just insert those back in? > Requirements: can you throw some more light on this requirement failures and rollbacks If the operation fails after creating a plan, then it should be eventually rolled back by a rollback call (as part of clean’s rollback of failed writes). The rollback implementation should consist of “undoing” all the DFS operations: after rollback is completed, any partitions that were attempted to be stashed should still have their (latest) data files and any partitions attempted to be restored should still remain empty. Sure, the idea here is that if the stash/restore operations fail, then we should be able to rely on `clean` to roll them back. Since the stash/restore attempt might not be re-tried > Wanted to brainstorm on some idea towards the requirement: Thanks for sharing. One constraint we have is that we don't know how long exactly we need to keep the stashed partition around for a future restore. So we cannot setup an automatic "delete if user doesn't ask for a restore within N hours" TTL. If we want to split the stash/restore API each into multiple operations though, there was another approach/design I had briefly brainstormed. The idea is that we add new HUDI operation to "unregister" a partition, where the DFS partition will still stay in the dataset as-is, but its files/records will no longer be part of the HUDI dataset (and won't be queryable or in MDT). And similar a "register" operation that will make all data in the DFS partition "appear" in the HUDI dataset. The idea is that for stashing, we can "unregister" the partition, then use DFS operations to rename/copy the folder to the stash folder (and remove files not part of the latest snapshot). And to recover, we can again use DFS operations to make sure the DFS partition is added back to same location before calling "register". The advantage of this approach is that the implementation of "moving" files back and forth from the stash location doesn't need to use HUDI APIs. And it makes it easer to reason abou t rollbacks. The drawback though is that we may have to consider how to handle cases where other operations attempt to write to an "unregistered" partition - since ideally they should fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
