[
https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965678#comment-16965678
]
sivabalan narayanan commented on HUDI-15:
-----------------------------------------
Thanks for the heads up. I guess I get the requirement now.
I went through the code path for deletes. Few follow up questions:
* Are we looking to introduce a class for Delete similar to HoodieRecord(used
for inserts and updates) and HoodieMergeHandle? Or is our intention is just add
a new delete api for external facing clients and not touch internal pieces as
much as possible? I am bit vary of touching HoodieRecord, since its being used
across the board.
* Wrt schema fix, here is what I am thinking as a fix.
** Fix LogReaderUtils.readSchemaFromLogFileInReverse() to iterate over log
blocks in reverse to find the first non delete block and return the schema?
** I know we have a corner case here too. If all blocks are delete blocks,
will have to fetch the schema from base file and return.
> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
> Issue Type: New Feature
> Components: Spark datasource, Write Client
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer,
> WriteClient and datasources. Currently there are two ways to delete, soft
> deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes.
> We need to ensure for hard deletes, we are able to leverage
> EmptyHoodieRecordPayload with just the HoodieKey and empty record value for
> deleting.
> [https://github.com/uber/hudi/issues/531]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)