jens4doc opened a new issue, #10581:
URL: https://github.com/apache/hudi/issues/10581

   For a HUDI table the goal is to apply GDPR (Right To Be Forgotten) and 
delete a row with key Y from table_x. If I perform a hard delete for key X the 
row is only deleted for the latest commit.
   How can I make sure the key is deleted for all commits on the HUDI table, 
(otherwise the right to be forgotten cannot be applied)?
   
   I did a POC: I executed a hard delete which should the complete row.
   
   Delete example:
   ```
   hard_delete_df = spark.sql("SELECT * FROM table_x where emp_id='Y' ")
   hudi_options['hoodie.datasource.write.operation'] = 'delete'
   
hard_delete_df.write.format("hudi").options(**hudi_options).mode("append").save(final_base_path)
   ```
   Timetravel example to go to commit BEFORE the commit that contains the 
delete:
   ```
   df_commitbeforedelete = spark.read \
     .format("org.apache.hudi")\
     .option("as.of.instant", "timebeforedelete") \
     .load("s3a://hudi-s3/table_x")
   df_commitbeforedelete.show()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to