Hi Arina, thanks for reporting this issue, and for the thorough write-up on
that issue!

I suspect that this has something to do with PR #218
<https://github.com/apache/incubator-iceberg/pull/218> that introduced
special handling for files that are deleted in transactions. The problem
that PR fixed was that a manifest was created, merged, and then deleted.
Then the transaction failed to commit and retried. The manifest that was
created was reused, but in the retry it didn’t get merged and was still a
valid metadata file. Since the file had been deleted on the first try, the
table was missing a manifest.

The fix was to introduce a lazy delete for cleaning up. The transaction
keeps track of files to delete and deletes them after the commit succeeds.
What might be happening here is the first time the transaction tries to
commit, it is out of date and retries, then the original manifest is not
deleted on the second attempt. Looking at the cleanup code, I think this
looks like the problem because the filtered manifest cache is cleared as
files are deleted:
https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L336

I think the fix is to add a list of files that should be deleted on every
attempt. When the filtered cache is cleared, each file should be deleted
and moved to the delete list. That way future attempts also delete the
files.

rb

On Tue, Jul 30, 2019 at 12:48 PM Arina Yelchiyeva <
[email protected]> wrote:

> Hi all,
>
> I have noticed that when performing delete operation in transaction and
> there are at least two snapshots prior to delete operation in Iceberg table,
> delete operation produces two manifests files where one is orphan. Note,
> if delete operation performed not in transaction, everything works fine.
>
> Orphaned manifest files subsequently are not deleted during snapshots
> expiration and keep pilling up.
> I have described the issue in more details in
> https://github.com/apache/incubator-iceberg/issues/330.
>
> Maybe someone has an idea why orphan file is created?
>
> Kind regards,
> Arina
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to