[
https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497023#comment-16497023
]
Ankit Singhal commented on HBASE-20642:
---------------------------------------
{quote} This is a general problem with the synchronous calls; they are not
built to migrate across server failure. Currently there is no connection
between running procedure and client invocation other than the stalled call.
Perhaps we could build some sort of tether but the thinking was that we'd move
off these old-style (deprecated) synchronous calls to instead use async where
we do have a connection between the invocation and the running procedure via
the returned future.
{quote}
Current implementation of synchronous calls in HBase simulates the way the
client will handle the async calls, Like waiting for the future to return the
results. so except the Modify and Truncate table procedure, the current
mechanism is good, like we submit the procedure and checking periodically for
the procedure to complete in separate calls which can handle the migration of
master as well.
{quote} Not to complete. The latch covers setup of the procedure only (A quote
from HBASE-19953 suggests doc to make it clear that "....the latch is just-for
the Procedure preparation – that we are not blocking for the whole procedure
run...")
{quote}
Yes, but in case of Modify and Truncate table procedure only, a latch is
released at the end of the procedure. Raised HBASE-20658 for that.
{quote}Yeah, this is a problem (why have nonce's if client is doing this...).
Does this break your suggested solution here? Or rather, it needs client
changes too?
{quote}
We may not need client change if we fix
{quote}Re-reading the description, how would ensuring nonce-respect help? We'll
not resubmit the procedure but neither will we recognize its successful
completion since it happens on the new master, not the old.
{quote}
In case of synchronous calls as well , we check for procedure completion by
requesting the server for the procedure results periodically, so the call will
get to know if a new master has completed the procedure.
Procedure is getting resubmitted in case of Modify and Truncate table procedure
because of HBASE-20658.
Just to summarize,
if we fix HBASE-20658 by releasing the latch after some pre-checks for Modify
and Truncate table , then probably we may not need to do nonce check as retry
mechanism will not kick in if the procedure is submitted successfully.
Thanks [~stack], What do you say for HBASE-20658?
> IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
> -------------------------------------------------------------------------
>
> Key: HBASE-20642
> URL: https://issues.apache.org/jira/browse/HBASE-20642
> Project: HBase
> Issue Type: Bug
> Reporter: Ankit Singhal
> Assignee: Ankit Singhal
> Priority: Major
> Attachments: HBASE-20642.patch
>
>
> [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing
> while adding column family during the time master is restarting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)