[
https://issues.apache.org/jira/browse/HIVE-22336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954935#comment-16954935
]
Dinesh Chitlangia commented on HIVE-22336:
------------------------------------------
[~kuczoram] Thanks for filing this patch. Latest patch looks clean.
> The updates should be pushed to the Metastore backend DB before creating the
> notification event
> -----------------------------------------------------------------------------------------------
>
> Key: HIVE-22336
> URL: https://issues.apache.org/jira/browse/HIVE-22336
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 4.0.0
> Reporter: Marta Kuczora
> Assignee: Marta Kuczora
> Priority: Major
> Attachments: HIVE-22336.1.patch, HIVE-22336.2.patch,
> HIVE-22336.3.patch
>
>
> There was an issue on HDP-3.1 where a table couldn't be deleted, because some
> related objects (like storage descriptor) were missing from the metastore.
> There was a previous delete attempt on that table which went wrong, but no
> rollback happened, that's why the SD were missing. In that previous delete,
> the notification creation swallowed the error which came from the backend DB,
> that's why no rollback happened. Here are the steps which happened in the
> first delete attempt:
>
> # Open a transaction (transaction_1) - this step was successful
> # Delete all the objects which are related to the table - this step was
> successful too, so the SD and other objects were deleted
> # Delete the table - this step failed in the backend DB, but according to the
> log the delete happens in a batch statement, so it won't necessarily be
> executed right at this moment, so we won't see an error here
> # Create a notification about the table delete:
> ## Open an other transaction for the notification creation (transaction_2) -
> call the ObjectStore.openTransaction method which increases a counter for
> open transactions and then checks if there is already an active transaction.
> If there is, then just returns true and doesn't really create a new
> transaction.
> ## Lock the notification id in the metastore backend db for update - here is
> where the exception from the backend DB (let's call it "MySQL Exception")
> manifests
> ## If an exception occurs during acquiring the log, retry - The "MySQL
> Exception" was caught and since there is no check on the exception, the retry
> mechanism thinks that it happened because couldn't acquire the log for the
> notification id, so retries and "forgot" about the "MySQL Exception".
> ## If the lock was acquired successfully, create the notification - Second
> time, the lock was acquired successfully, so the notification creation was
> successful.
> ## Commit transaction_2 - Just decrease the transaction counter, but doesn't
> actually commits anything.
> # Commit transaction_1 - This commits the transaction, but since the error
> already got manifested and kind of "handled", here we won't see any error,
> just that the commit was successful, so no rollback happens and leaves the
> table object in an invalid state.
> # If the commit was not successful then rollback
> In the customer setup, this issue could be fixed by adding a flush call
> before creating the notification event, so all the updates would be pushed to
> the backend db and the error would manifest at this point. With this, the
> error would go back to the HiveMetastore class which would do the rollback
> and the delete table operation would fail as it should be, since the table
> couldn't be deleted. But then the Hivemetastore retry mechanism could try the
> table deletion again.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)