[ 
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962041#comment-16962041
 ] 

sivabalan narayanan commented on HUDI-15:
-----------------------------------------

Few questions/clarifications as I am working through this. I am just getting to 
know the code base, and so some questions are related to code and some are 
related to designing. As of now, have looked at HoodieWriteClient code flow. 
 * I see that already COW table supports delete by way to updating with empty 
records. Are rollbacks automatically taken care of w/o any special assistance? 
 * can we assume compaction will maintain the empty records for deleted 
entries? Or will it remove it. 
 * If compaction is going to remove the empty record, will the rollback of 
compactions might get tricky or its a no op specifically wrt deleted entries. 
 * If compaction is going to maintain empty records, won't that be a space 
constraint at some point when most of the records are deleted. 
 * Whats the expected behavior if someone tries to delete an already deleted 
entry? Just ignore that from input records to be deleted? 
 * If someone tries to update a deleted entry, we throw an exception or 
silently ignore and return those records to the caller? Whats the current 
behavior and whats the ideal behavior we want to get to. 

btw, have taken a note of the schema issue from the referenced link.  

> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
>                 Key: HUDI-15
>                 URL: https://issues.apache.org/jira/browse/HUDI-15
>             Project: Apache Hudi (incubating)
>          Issue Type: New Feature
>          Components: Spark datasource, Write Client
>            Reporter: Vinoth Chandar
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer, 
> WriteClient and datasources. Currently there are two ways to delete, soft 
> deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. 
> We need to ensure for hard deletes, we are able to leverage 
> EmptyHoodieRecordPayload with just the HoodieKey and empty record value for 
> deleting.
> [https://github.com/uber/hudi/issues/531]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to