kbuci commented on issue #17866: URL: https://github.com/apache/hudi/issues/17866#issuecomment-4101404257
@nsivabalan thanks, based on our prior discussions, let me summarize the planned approach to implement stashing For the below, to make this discussion DFS agnostic, lets assume that when we call `rename` we are internally calling a helper function that - Copies all data to dest, then deletes from source if not HDFS - otherwise does a rename - Is idempotent, in the sense that if its called again on the same (source, dest), it will handle all cases (everything in source, some files in dest etc). We will create a custom `SparkPreCommitValidator` that, when executed ``` 1. Reads the stash partition parent folder from the extraMetadata of ongoing operation (asserting that its delete_partition) 2. Create an empty map 3. Creates a spark task for each partition. Within this task, get the source path (basepath/partition) and the dest path (stash_folder/partition), creating the latter folders if not already created. 3a. If the source path is completely empty , then mark the partition in the map as FAILED 3b. Otherwise, call rename on source -> dest. If success, mark as SUCCESS. Otherwise, mark as FAILED 4. Write out this partition->status map to some file in the basepath, maybe /.hoodie/.stash/<instant time> ? ``` Now we can call deletePartitions API with this "validator" Note that we should check that calling `deletePartition` on partitions that are empty (not even having the [dot] hoodie partition metafile and/or no longer in MDT) will still not fail but will still replace the files in the other (nonempty) partitions. If this is an issue, then we can relax (3) to force the validator to mark 3a as success and fail if any rename operation fails. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
