[
https://issues.apache.org/jira/browse/HUDI-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pramod Biligiri updated HUDI-4994:
----------------------------------
Description:
Datahub has a notion of soft-deletes (the entity still exists in the database
with a status=removed:true). Such entities could get re-ingested with new
properties at a later time, such that the older one gets overwritten. The
current implementation in DatahubSyncTool does not handle this scenario. It
fails to update the status flag to removed:false during ingest, which means the
entity won't surface in the Datahub UI at all.
Ref: See sections on Soft Delete and Hard Delete in the Datahub docs:
[https://datahubproject.io/docs/how/delete-metadata/#soft-delete-the-default]
was:
When DatahubSyncTool updates an entity in Datahub using an UPSERT request of
their RestEmiiter client, it can be assumed that the entity is no longer
considered deleted, and needs to be discoverable henceforth in the Datahub UI.
For that, it is necessary to explicitly set the "status" metadata aspect of the
entity to "\{'removed':false}". This will handle the situation where the entity
may have been (soft) deleted in the past. The addition of this "removed:false"
for "status" aspect has no impact on newly created entities, or hard-deleted
entities (of which no trace remains anyway).
Ref: See sections on Soft Delete and Hard Delete in the Datahub docs:
https://datahubproject.io/docs/how/delete-metadata/#soft-delete-the-default
Summary: DatahubSyncTool does not correctly re-ingest soft-deleted
entities (was: DatahubSyncTool should set "removed" status of an entity to
false when updating it)
> DatahubSyncTool does not correctly re-ingest soft-deleted entities
> ------------------------------------------------------------------
>
> Key: HUDI-4994
> URL: https://issues.apache.org/jira/browse/HUDI-4994
> Project: Apache Hudi
> Issue Type: Task
> Components: meta-sync
> Reporter: Pramod Biligiri
> Priority: Major
> Labels: pull-request-available
>
> Datahub has a notion of soft-deletes (the entity still exists in the database
> with a status=removed:true). Such entities could get re-ingested with new
> properties at a later time, such that the older one gets overwritten. The
> current implementation in DatahubSyncTool does not handle this scenario. It
> fails to update the status flag to removed:false during ingest, which means
> the entity won't surface in the Datahub UI at all.
> Ref: See sections on Soft Delete and Hard Delete in the Datahub docs:
> [https://datahubproject.io/docs/how/delete-metadata/#soft-delete-the-default]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)