[
https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494416#comment-16494416
]
Ankit Singhal commented on HBASE-20642:
---------------------------------------
bq. That is not my understanding. The nonces are in an in-memory-only map in
the Master process. They will not be migrated from one Master to the new
one.... so, even if you put calls behind a nonce-check, it'll fail since the
nonce-map is empty on new Master.
I just checked the code and see that while loading
procedure(ProcedureExecutor#loadProcedures) from MasterProcWals during restart,
we are already rebuilding in-memory nonce map from the persisted noncekey. so
the retry should also get identified on the new master and ignored. Though
there is different problem with the client which is generating new noncekey for
every retry.
bq. Because the Master is failing which broke the synchronous wait on add
column? Maybe add a check if master is going down and if it is throw that for
an exception instead of doing this pre-flight check against current state of
table descriptor? Would that be more meaningful?
We may not catch it If the master is killed, nevertheless user may not expect
the exception if the procedure is completed successfully at the new master.
bq. It is pretty cool that the call keeps going though the Master has
crashed... I think it is a bit much to expect that this call can pick up where
it left off on the old Master though. It has no reference to the original
transaction (it does not have a Future .... ).
The retry call was moved to the new master and the new master during
initialization will pick up procedure from the state at which it was persisted
last by an old master in ProcWals and ignore the retry. so we should not have
any problem if the operations are idempotent for a state. Will not this process
happen when standby become active?
bq. We want to move folks over to the async calls where they check to see if
the Procedure is completed.....
It seems after HBASE-19953, Asynchronous call for DDLs also wait for the
procedure to complete( as countDown() will happen when the procedure is
completed).
Thanks for bearing with me, I know there are too many questions.
> IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
> -------------------------------------------------------------------------
>
> Key: HBASE-20642
> URL: https://issues.apache.org/jira/browse/HBASE-20642
> Project: HBase
> Issue Type: Bug
> Reporter: Ankit Singhal
> Assignee: Ankit Singhal
> Priority: Major
> Attachments: HBASE-20642.patch
>
>
> [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing
> while adding column family during the time master is restarting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)