[ 
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962147#comment-16962147
 ] 

Vinoth Chandar commented on HUDI-15:
------------------------------------

1. yes. we already support deletes like that by issuing upsert(). We need to do 
it via a `delete()` API that simply needs a  `RDD[HoodieKey]` as opposed to 
requiring the `RDD[HoodieRecord]` that the upsert() API needs. Rollback is 
taken care of for CopyOnWrite/MergeOnRead (see `HoodieDeleteBlock` class) 
automatically.   
2. Not following fully. Compaction will remove the record from base file if 
there the key was part of a delete block and no further upserts happened to the 
key after that. 
3. There wont be a physical empty record. All we do is just skip a deleted 
record from the base parquet file, if it was deleted. i.e physically 
remove/hard delete it from latest version of the parquet file. (both 
mergehandle in cow and compaction in mor do this). Rollbacks will continue to 
work as intended. In cow, the record would be removed from the latest file 
slice's parquet file. but the older slices would have it until cleaner deletes 
them. So if you need to rollback the delete you just delete the latest file 
slice's parquet file. 
4. it wont
5. yes. silently ignore is ok for now
6. we should take that update as a regular write.. Thats the current and ideal 
behavior. do you see any issue?

Hope that helps

> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
>                 Key: HUDI-15
>                 URL: https://issues.apache.org/jira/browse/HUDI-15
>             Project: Apache Hudi (incubating)
>          Issue Type: New Feature
>          Components: Spark datasource, Write Client
>            Reporter: Vinoth Chandar
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer, 
> WriteClient and datasources. Currently there are two ways to delete, soft 
> deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. 
> We need to ensure for hard deletes, we are able to leverage 
> EmptyHoodieRecordPayload with just the HoodieKey and empty record value for 
> deleting.
> [https://github.com/uber/hudi/issues/531]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to