aditiwari01 commented on issue #2743: URL: https://github.com/apache/hudi/issues/2743#issuecomment-810782992
I was thinking around similar lines but we do have continuous jobs (not deltastreamer, but spark streaming jobs with 5/10 mins minni batches). We can't have a separate job for deletion since we do not support concurrent writers. Another possible solution can be to have our table partitioned at commit time and then have a manual cleanup schedule jobs which deletes older partitions. The challenges here are that I am not sure if deleting some partition from outside can messup hoodie meta in some way? Also this will require us to have global indexing to avoid duplicates, which in turn can result in increased latencies. What are your thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
