Marta Kuczora created HIVE-22336:
------------------------------------
Summary: The updates should be pushed to the Metastore backend DB
before creating the notification event
Key: HIVE-22336
URL: https://issues.apache.org/jira/browse/HIVE-22336
Project: Hive
Issue Type: Bug
Components: Metastore
Affects Versions: 4.0.0
Reporter: Marta Kuczora
There was an issue on HDP-3.1 where a table couldn't be deleted, because some
related objects (like storage descriptor) were missing from the metastore.
There was a previous delete attempt on that table which went wrong, but no
rollback happened, that's why the SD were missing. In that previous delete, the
notification creation swallowed the error which came from the backend DB,
that's why no rollback happened. Here are the steps which happened in the first
delete attempt:
# Open a transaction (transaction_1) - this step was successful
# Delete all the objects which are related to the table - this step was
successful too, so the SD and other objects were deleted
# Delete the table - this step failed in the backend DB, but according to the
log the delete happens in a batch statement, so it won't necessarily be
executed right at this moment, so we won't see an error here
# Create a notification about the table delete:
## Open an other transaction for the notification creation (transaction_2) -
call the ObjectStore.openTransaction method which increases a counter for open
transactions and then checks if there is already an active transaction. If
there is, then just returns true and doesn't really create a new transaction.
## Lock the notification id in the metastore backend db for update - here is
where the exception from the backend DB (let's call it "MySQL Exception")
manifests
## If an exception occurs during acquiring the log, retry - The "MySQL
Exception" was caught and since there is no check on the exception, the retry
mechanism thinks that it happened because couldn't acquire the log for the
notification id, so retries and "forgot" about the "MySQL Exception".
## If the lock was acquired successfully, create the notification - Second
time, the lock was acquired successfully, so the notification creation was
successful.
## Commit transaction_2 - Just decrease the transaction counter, but doesn't
actually commits anything.
# Commit transaction_1 - This commits the transaction, but since the error
already got manifested and kind of "handled", here we won't see any error, just
that the commit was successful, so no rollback happens and leaves the table
object in an invalid state.
# If the commit was not successful then rollback
In the customer setup, this issue could be fixed by adding a flush call before
creating the notification event, so all the updates would be pushed to the
backend db and the error would manifest at this point. With this, the error
would go back to the HiveMetastore class which would do the rollback and the
delete table operation would fail as it should be, since the table couldn't be
deleted. But then the Hivemetastore retry mechanism could try the table
deletion again.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)