[
https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495259#comment-16495259
]
stack commented on HBASE-20642:
-------------------------------
bq. ....we are already rebuilding in-memory nonce map from the persisted
noncekey. so the retry should also get identified on the new master and
ignored.
You are right. I forgot about this aspect. Updated our rough doc to talk about
Nonce persistence....
https://docs.google.com/document/d/1QLXlVERKt5EMbx_EL3Y2u0j64FN-_TrVoM5WWxIXh6o/edit#
(I need to wind it into our refguide).
bq. Though there is different problem with the client which is generating new
noncekey for every retry.
Yeah, this is a problem (why have nonce's if client is doing this...). Does
this break your suggested solution here? Or rather, it needs client changes too?
bq. We may not catch it If the master is killed, nevertheless user may not
expect the exception if the procedure is completed successfully at the new
master.
Yes.
This is a general problem with the synchronous calls; they are not built to
migrate across server failure. Currently there is no connection between running
procedure and client invocation other than the stalled call. Perhaps we could
build some sort of tether but the thinking was that we'd move off these
old-style (deprecated) synchronous calls to instead use async where we do have
a connection between the invocation and the running procedure via the returned
future.
Meantime we have this half-way situation where the client synchronous API fails
but behind the scenes it may prevail. The circumstance should be rare in
practice but yeah, what to do so we don't surprise the operator.
bq. ....It seems after HBASE-19953, Asynchronous call for DDLs also wait for
the procedure to complete( as countDown() will happen when the procedure is
completed)....
Not to complete. The latch covers setup of the procedure only (A quote from
HBASE-19953 suggests doc to make it clear that "....the latch is just-for the
Procedure preparation – that we are not blocking for the whole procedure
run...")
> IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
> -------------------------------------------------------------------------
>
> Key: HBASE-20642
> URL: https://issues.apache.org/jira/browse/HBASE-20642
> Project: HBase
> Issue Type: Bug
> Reporter: Ankit Singhal
> Assignee: Ankit Singhal
> Priority: Major
> Attachments: HBASE-20642.patch
>
>
> [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing
> while adding column family during the time master is restarting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)