Sergey Shelukhin created HBASE-22081:
----------------------------------------

             Summary: master shutdown: close RpcServer first thing, close 
procWAL as soon as viable, and delete znode the last thing
                 Key: HBASE-22081
                 URL: https://issues.apache.org/jira/browse/HBASE-22081
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


I had a master get stuck due to HBASE-22079 and it was logging RS abort 
messages during shutdown.
[~bahramch] found some issues where messages are processed by old master during 
shutdown due to a race condition in RS cache (or it could also happen due to a 
network race).
Previously I found some bug where SCP was created during master shutdown that 
had incorrect state (because some structures already got cleaned).

I think before master fencing is implemented we can at least make these issues 
much less likely by thinking about shutdown order.
1) First kill RCP server so we don't receive any more messages.
2) Then do whatever cleanup we think is needed that requires proc wal.
3) Then close proc WAL so no errant threads can create more procs.
4) Then do whatever other cleanup.
5) Finally delete znode.

Right now znode is deleted somewhat early I think, and RpcServer is closed very 
late.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to