aditiwari01 commented on issue #2743:
URL: https://github.com/apache/hudi/issues/2743#issuecomment-810782992


   I was thinking around similar lines but we do have continuous jobs (not 
deltastreamer, but spark streaming jobs with 5/10 mins minni batches). We can't 
have a separate job for deletion since we do not support concurrent writers. 
   
   Another possible solution can be to have our table partitioned at commit 
time and then have a manual cleanup schedule jobs which deletes older 
partitions. The challenges here are that I am not sure if deleting some 
partition from outside can messup hoodie meta in some way? Also this will 
require us to have global indexing to avoid duplicates, which in turn can 
result in increased latencies.
   
   What are your thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to