[
https://issues.apache.org/jira/browse/KUDU-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511742#comment-16511742
]
Dan Burkert commented on KUDU-2475:
-----------------------------------
First thing I want to do is write some test cases that demonstrate the race.
Then, there are a couple of ways we could harden this up:
1) Investigate some type of 2pc-like intent log in the catalog manager. The
catalog manager would first persist all renames and drops to the master tablet
before proceeding with sending the op to the HMS. Upon election new leaders
must immediately replay this intent log. It's not clear how the replaying
leader could robustly determine which ops have been applied to the HMS, though.
It would probably also require integration into the RPC cache to work
correctly with retrying clients as well.
2) Modify the HMS to either pipe through an identifier from requests to the
resulting notification log entry, or equivalently, return the notification log
entry ID with each response.
3) At the cheap/hacky end of the spectrum, perhaps the master could just sanity
check that the operation it's waiting on to complete actually happened before
returning to the client?
> HMS Catalog consistency with multi-master
> -----------------------------------------
>
> Key: KUDU-2475
> URL: https://issues.apache.org/jira/browse/KUDU-2475
> Project: Kudu
> Issue Type: Improvement
> Components: hms, master
> Reporter: Dan Burkert
> Assignee: Dan Burkert
> Priority: Major
>
> There are potential issues in the current iteration of the HMS integration
> which may cause clients to recieve acks for ALTER TABLE RENAME / DROP TABLE
> operations which don't succeed. See [Adar's
> comment|https://gerrit.cloudera.org/c/8313/27/src/kudu/master/hms_notification_log_listener.cc#234]
> on the notification log listener patch for more info.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)