Hi, IIUC, what you want is for the deletes to be applied on different versions of the data? so that no time travel query can read the deleted field again. I am afraid this cannot be achieved as-is today and would need logging these deletes for older base files - that might be one way to achieve this. needs more discussion, but the good thing is the hudi's log based design lends itself to doing this. it's an interesting use-case. thanks for bringing this up!
As a workaround, would it be possible to split the snapshot and time-travel queries into different tables for now? i.e the time-travel table will be insert-only and you can use snapshot queries to achieve the effect of and thus at a later time, you can just issue a delete to get rid of the field from all versions of the record. maybe this makes the time travel more expensive? I guess? On Thu, Jul 30, 2020 at 6:08 AM Sivaprakash <sivaprakashshanmu...@gmail.com> wrote: > Hello > > What I see is; If I we want to implement GDPR ( > > https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIdeleterecordsinthedatasetusingHudi > ) > then old version of commit files should be removed (otherwise incremental > query with point-time options can still read the data which is deleted in > latter stage). Time travel query is not possible anymore if we want to > implement GDPR? any configurations/options to delete only specific records > in the older commit files instead of removing the whole file? > > Thanks >