[ 
https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495259#comment-16495259
 ] 

stack commented on HBASE-20642:
-------------------------------

bq. ....we are already rebuilding in-memory nonce map from the persisted 
noncekey. so the retry should also get identified on the new master and 
ignored. 

You are right. I forgot about this aspect. Updated our rough doc to talk about 
Nonce persistence.... 
https://docs.google.com/document/d/1QLXlVERKt5EMbx_EL3Y2u0j64FN-_TrVoM5WWxIXh6o/edit#
 (I need to wind it into our refguide).

bq. Though there is different problem with the client which is generating new 
noncekey for every retry.

Yeah, this is a problem (why have nonce's if client is doing this...). Does 
this break your suggested solution here? Or rather, it needs client changes too?

bq. We may not catch it If the master is killed, nevertheless user may not 
expect the exception if the procedure is completed successfully at the new 
master.

Yes.

This is a general problem with the synchronous calls; they are not built to 
migrate across server failure. Currently there is no connection between running 
procedure and client invocation other than the stalled call. Perhaps we could 
build some sort of tether but the thinking was that we'd move off these 
old-style (deprecated) synchronous calls to instead use async where we do have 
a connection between the invocation and the running procedure via the returned 
future.

Meantime we have this half-way situation where the client synchronous API fails 
but behind the scenes it may prevail. The circumstance should be rare in 
practice but yeah, what to do so we don't surprise the operator.

bq. ....It seems after HBASE-19953, Asynchronous call for DDLs also wait for 
the procedure to complete( as countDown() will happen when the procedure is 
completed)....

Not to complete. The latch covers setup of the procedure only (A quote from 
HBASE-19953 suggests doc to make it clear that "....the latch is just-for the 
Procedure preparation – that we are not blocking for the whole procedure 
run...")




> IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException 
> -------------------------------------------------------------------------
>
>                 Key: HBASE-20642
>                 URL: https://issues.apache.org/jira/browse/HBASE-20642
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ankit Singhal
>            Assignee: Ankit Singhal
>            Priority: Major
>         Attachments: HBASE-20642.patch
>
>
> [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing 
> while adding column family during the time master is restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to