[GitHub] [iceberg] RussellSpitzer opened a new issue #2317: Corrupted Metadata when Catalog Fails During Commit

GitBox Wed, 10 Mar 2021 15:04:04 -0800


RussellSpitzer opened a new issue #2317:
URL: https://github.com/apache/iceberg/issues/2317

The current logic for doing a commit is at a high level as follows

1. Write all files including a new-metadata.json for the operation
2. Acquiring a lock
3. Swapping the pointer from old-metadata.json to new-metadata.json
4. Releasing the lock

If we fail during 3 we will always attempt to cleanup the files we created
in step 1. This is a problem when step 3 (the swap) has been successful server
side but the client has not received an acknowledgment. This leads to a state
where

1. Catalog is pointing to new-metadata.json
2. Our client is actively removing old-metadata.json and all files which
were added for the operation

Future clients which are able to contact the HMS will see the
new-metdata.json location, attempt to read it and fail.

What's worse is that if there are multiple clients attempting to work with
this table, there is a window of time in which a second client can read
new-metadata.json before it is removed and can build a new-new-metadata.json
based off of it. The new-new-metadata.json will now have references to files
which are in the process of being removed by the first client.

To avoid this we need to handle failures in stage 3 of table commits
slightly differently, basically we need to group failures into two categories:

1. Failures that are reports from the Catalog that it could not perform the
operation for some reason
2. Failures where the client has lost contact for some reason (basically
everything else)

Type 1 failures can be cleaned up, we know the commit did not succeed and
the files we have currently generated are more or less useless.

Type 2 failures must still be reported as failures, but cannot be retried
and cannot be cleaned up. We do not know if our metadata.json is pointing to
the new file or not and thus we cannot resolve the situation until
communication with the HMS is restored. Since we are usually talking about a
client perspective here, I believe the right thing to do is to fail and let the
user know they need to manually check and clean up files if necessary.

I haven't checked other catalog implementations to see if they have similar
vulnerabilities so we should probably be checking those as well. The HMS code
in question is
[here](https://github.com/apache/iceberg/blob/c7ff6d51377eaae7f476e0c0730dea3b5a84fabb/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L214-L229)

CC: @aokolnychyi , @karuppayya , @raptond

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer opened a new issue #2317: Corrupted Metadata when Catalog Fails During Commit

Reply via email to