[
https://issues.apache.org/jira/browse/HUDI-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krishen Bhan updated HUDI-7655:
-------------------------------
Description:
When a HUDI clean plan is executed, any targeted file that was not confirmed as
deleted (or non-existing) will be marked as a "failed delete". Although these
failed deletes will be added to `.clean` metadata, if incremental clean is used
then these files might not ever be picked up again as a future clean plan,
unless a "full-scan" clean ends up being scheduled. In addition to leading to
more files unnecessarily taking up storage space for longer, then can lead to
the following dataset consistency issue for COW datasets:
# Insert at C1 creates file group f1 in partition
# Replacecommit at RC2 creates file group f2 in partition, and replaces f1
# Any reader of partition that calls HUDI API (with or without using MDT) will
recognize that f1 should be ignored, as it has been replaced. This is since RC2
instant file is in active timeline
# Some completed instants later an incremental clean is scheduled. It moves
the "earliest commit to retain" to an time after instant time RC2, so it
targets f1 for deletion. But during execution of the plan, it fails to delete
f1.
# An archive job eventually is triggered, and archives C1 and RC2. Note that
f1 is still in partition
At this point, any job/query that reads the aforementioned partition directly
from the DFS file system calls (without directly using MDT FILES partition)
will consider both f1 and f2 as valid file groups, since RC2 is no longer in
active timeline. This is a data consistency issue, and will only be resolved if
a "full-scan" clean is triggered and deletes f1.
This specific scenario can be avoided if the user can configure HUDI clean to
fail execution of a clean plan unless all files are confirmed as deleted (or
not existing in DFS already), "blocking" the clean. The next clean attempt will
re-execute this existing plan, since clean plans cannot be "rolled back".
was:
When a HUDI clean plan is executed, any targeted file that was not confirmed as
deleted (or non-existing) will be marked as a "failed delete". Although these
failed deletes will be added to `.clean` metadata, if incremental clean is used
then these files might not ever be picked up again as a future clean plan,
unless a "full-scan" clean ends up being scheduled. In addition to leading to
more files unnecessarily taking up storage space for longer, then can lead to
the following dataset consistency issue for COW datasets:
# Insert at C1 creates file group f1 in partition
# Replacecommit at RC2 creates file group f2 in partition, and replaces f1
# Any reader of partition that calls HUDI API (with or without using MDT) will
recognize that f1 should be ignored, as it has been replaced. This is since RC2
instant file is in active timeline
# Some completed instants later an incremental clean is scheduled. It moves
the "earliest commit to retain" to an time after instant time RC2, so it
targets f1 for deletion. But during execution of the plan, it fails to delete
f1.
# An archive job eventually is triggered, and archives C1. Note that f1 is
still in partition
At this point, any job/query that reads the aforementioned partition directly
from the DFS file system calls (without directly using MDT FILES partition)
will consider both f1 and f2 as valid file groups, since RC2 is no longer in
active timeline. This is a data consistency issue, and will only be resolved if
a "full-scan" clean is triggered and deletes f1.
This specific scenario can be avoided if the user can configure HUDI clean to
fail execution of a clean plan unless all files are confirmed as deleted (or
not existing in DFS already), "blocking" the clean. The next clean attempt will
re-execute this existing plan, since clean plans cannot be "rolled back".
> Support configuration for clean to fail execution if there is at least one
> file is marked as a failed delete
> ------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-7655
> URL: https://issues.apache.org/jira/browse/HUDI-7655
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Krishen Bhan
> Priority: Minor
> Labels: clean
>
> When a HUDI clean plan is executed, any targeted file that was not confirmed
> as deleted (or non-existing) will be marked as a "failed delete". Although
> these failed deletes will be added to `.clean` metadata, if incremental clean
> is used then these files might not ever be picked up again as a future clean
> plan, unless a "full-scan" clean ends up being scheduled. In addition to
> leading to more files unnecessarily taking up storage space for longer, then
> can lead to the following dataset consistency issue for COW datasets:
> # Insert at C1 creates file group f1 in partition
> # Replacecommit at RC2 creates file group f2 in partition, and replaces f1
> # Any reader of partition that calls HUDI API (with or without using MDT)
> will recognize that f1 should be ignored, as it has been replaced. This is
> since RC2 instant file is in active timeline
> # Some completed instants later an incremental clean is scheduled. It moves
> the "earliest commit to retain" to an time after instant time RC2, so it
> targets f1 for deletion. But during execution of the plan, it fails to delete
> f1.
> # An archive job eventually is triggered, and archives C1 and RC2. Note that
> f1 is still in partition
> At this point, any job/query that reads the aforementioned partition directly
> from the DFS file system calls (without directly using MDT FILES partition)
> will consider both f1 and f2 as valid file groups, since RC2 is no longer in
> active timeline. This is a data consistency issue, and will only be resolved
> if a "full-scan" clean is triggered and deletes f1.
> This specific scenario can be avoided if the user can configure HUDI clean to
> fail execution of a clean plan unless all files are confirmed as deleted (or
> not existing in DFS already), "blocking" the clean. The next clean attempt
> will re-execute this existing plan, since clean plans cannot be "rolled
> back".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)