Hi, I'm currently facing the challenge of GDPR compliance on an HDFS cluster. The most troubling part is "the right to be forgotten" activable by customers. If activated, this new right forces companies to delete all data related to this user. Since HDFS is WORM you can see the issue.
I reach out to HDFS mailing list and people told me that HDFS is not a fit for my use case (but I can't see myself migrating everything to Kudu/Hbase) but one person told me to check Hudi and it looks very promising. Hence, I wanted to know if my use case (deleting lines in HDFS datasets based on a user uuid) seems suitable for Hudi as I think it is. Also I'm really interested in any feedback on companies using this tool (on production environment) as I'm wondering if it is production ready. I believe Uber does but I'm not aware of anyone else. Thanks in advance for your help, best regards, Ivan
