[ https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962147#comment-16962147 ]
Vinoth Chandar commented on HUDI-15: ------------------------------------ 1. yes. we already support deletes like that by issuing upsert(). We need to do it via a `delete()` API that simply needs a `RDD[HoodieKey]` as opposed to requiring the `RDD[HoodieRecord]` that the upsert() API needs. Rollback is taken care of for CopyOnWrite/MergeOnRead (see `HoodieDeleteBlock` class) automatically. 2. Not following fully. Compaction will remove the record from base file if there the key was part of a delete block and no further upserts happened to the key after that. 3. There wont be a physical empty record. All we do is just skip a deleted record from the base parquet file, if it was deleted. i.e physically remove/hard delete it from latest version of the parquet file. (both mergehandle in cow and compaction in mor do this). Rollbacks will continue to work as intended. In cow, the record would be removed from the latest file slice's parquet file. but the older slices would have it until cleaner deletes them. So if you need to rollback the delete you just delete the latest file slice's parquet file. 4. it wont 5. yes. silently ignore is ok for now 6. we should take that update as a regular write.. Thats the current and ideal behavior. do you see any issue? Hope that helps > Add a delete() API to HoodieWriteClient as well as Spark datasource #531 > ------------------------------------------------------------------------ > > Key: HUDI-15 > URL: https://issues.apache.org/jira/browse/HUDI-15 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Components: Spark datasource, Write Client > Reporter: Vinoth Chandar > Assignee: sivabalan narayanan > Priority: Major > Fix For: 0.5.1 > > > Delete API needs to be supported as first class citizen via DeltaStreamer, > WriteClient and datasources. Currently there are two ways to delete, soft > deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. > We need to ensure for hard deletes, we are able to leverage > EmptyHoodieRecordPayload with just the HoodieKey and empty record value for > deleting. > [https://github.com/uber/hudi/issues/531] -- This message was sent by Atlassian Jira (v8.3.4#803005)