[
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971243#comment-16971243
]
Vinoth Chandar commented on HUDI-15:
------------------------------------
We need both a,b
In datasource, we convert DF's schema to avro and set that.. You can look into
how best to special case deletes in that code path.. For schema, you can
probably pass in the actual df schema for now, to avoid NPEs.. but any
validation in HoodieMergeHandle may give you some trouble.. Another way is to
introduce a new HoodieDeleteHandle to do this cleanly without relying on
HoodieMergeHandle (which would then be for just soft deletes and upserts) ..
Also from a usability standpoint... we can also think about adding a method to
`HoodieDataSourceHelpers` for deletes to make some of this usage simpler.. That
could be a follow up
> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
> Issue Type: New Feature
> Components: Spark datasource, Write Client
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer,
> WriteClient and datasources. Currently there are two ways to delete, soft
> deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes.
> We need to ensure for hard deletes, we are able to leverage
> EmptyHoodieRecordPayload with just the HoodieKey and empty record value for
> deleting.
> [https://github.com/uber/hudi/issues/531]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)