[
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975868#comment-16975868
]
sivabalan narayanan commented on HUDI-15:
-----------------------------------------
[~vinoth] [~vbalaji]: I have some questions on supporting delete in
HoodieDeltaStreamer.
Ignoring scheduling, compaction and stuffs, core skeleton looks as below.
{code:java}
// code placeholder
syncOnce() {
val hoodieRecords = readFromSource(...)
writeToSink(hoodieRecords,...)
}
{code}
So, trying to see how does delete look like
- Will the readFromSource return HoodieKeys or HoodieRecords. If it is going to
return HoodieRecords, is it okay if we pass in the records as is to WriteClient
and set the operation type as Delete. Or should we strip off just the
HoodieKeys and call writeClient.delete(HoodieKeys).
> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
> Issue Type: New Feature
> Components: Spark datasource, Write Client
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer,
> WriteClient and datasources. Currently there are two ways to delete, soft
> deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes.
> We need to ensure for hard deletes, we are able to leverage
> EmptyHoodieRecordPayload with just the HoodieKey and empty record value for
> deleting.
> [https://github.com/uber/hudi/issues/531]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)