Hi,

IIUC, what you want is for the deletes to be applied on different versions
of the data? so that no time travel query can read the deleted field again.
I am afraid this cannot be achieved as-is today and would need logging
these deletes for older base files - that might be one way to achieve this.
needs more discussion, but the good thing is the hudi's log based design
lends itself to doing this. it's an interesting use-case. thanks for
bringing this up!

As a workaround, would it be possible to split the snapshot and time-travel
queries into different tables for now? i.e the time-travel table will be
insert-only and you can use snapshot queries to achieve the effect of and
thus at a later time, you can just issue a delete to get rid of the field
from all versions of the record. maybe this makes the time travel more
expensive? I guess?


On Thu, Jul 30, 2020 at 6:08 AM Sivaprakash <sivaprakashshanmu...@gmail.com>
wrote:

> Hello
>
> What I see is; If I we  want to implement GDPR (
>
> https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIdeleterecordsinthedatasetusingHudi
> )
> then old version of commit files should be removed (otherwise incremental
> query with point-time options can still read the data which is deleted in
> latter stage). Time travel query is not possible anymore if we want to
> implement GDPR? any configurations/options to delete only specific records
> in the older commit files instead of removing the whole file?
>
> Thanks
>

Reply via email to