[ https://issues.apache.org/jira/browse/HBASE-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell resolved HBASE-17055. ------------------------------------ Resolution: Incomplete Assignee: (was: Stephen Yuan Jiang) Fix Version/s: (was: 1.3.3) > Disabling table not getting enabled after clean cluster restart. > ---------------------------------------------------------------- > > Key: HBASE-17055 > URL: https://issues.apache.org/jira/browse/HBASE-17055 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 1.3.0 > Reporter: Y. SREENIVASULU REDDY > Priority: Major > > scenario: > 1. Disable the table, while disabling the table is in progress. > 2. Restart whole HBase service. > 3. Then enable the table. > the above operation leads to RIT continously. > pls find the below logs for understanding. > while disabling the table whole hbase service went down. > the following is the master logs > {noformat} > 2016-11-09 19:32:55,102 INFO > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] master.HMaster: > Client=seenu//host-1 disable testTable > 2016-11-09 19:32:55,257 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > procedure2.ProcedureExecutor: Procedure DisableTableProcedure > (table=testTable) id=8 owner=seenu state=RUNNABLE:DISABLE_TABLE_PREPARE added > to the store. > 2016-11-09 19:32:55,264 DEBUG [ProcedureExecutor-5] > lock.ZKInterProcessLockBase: Acquired a lock for > /hbase/table-lock/testTable/write-master:160000000000005 > 2016-11-09 19:32:55,285 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=8 > 2016-11-09 19:32:55,386 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=8 > 2016-11-09 19:32:55,513 INFO [ProcedureExecutor-5] > zookeeper.ZKTableStateManager: Moving table testTable state from DISABLING to > DISABLING > 2016-11-09 19:32:55,587 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=8 > 2016-11-09 19:32:55,628 INFO [ProcedureExecutor-5] > procedure.DisableTableProcedure: Offlining 1 regions. > . > . > . > . > . > . > . > . > 2016-11-09 19:33:02,871 INFO [AM.ZK.Worker-pool2-t7] master.RegionStates: > Offlined 1890fa9c085dcc2ee0602f4bab069d10 from host-1,16040,1478690163056 > Wed Nov 9 19:33:02 CST 2016 Terminating master > {noformat} > here we need to observe > {color:red} Offlined 1890fa9c085dcc2ee0602f4bab069d10 from > host-1,16040,1478690163056 {color} > then hmaster went down, all regionServers also made down. > After hmaster and regionserver are restarted > executed enable Table operation on the table. > {panel:title=HMaster > Logs|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#FFFFCE} > {noformat} > 2016-11-09 19:49:57,059 INFO > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] master.HMaster: > Client=seenu//host-1 enable testTable > 2016-11-09 19:49:57,325 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > procedure2.ProcedureExecutor: Procedure EnableTableProcedure > (table=testTable) id=9 owner=seenu state=RUNNABLE:ENABLE_TABLE_PREPARE added > to the store. > 2016-11-09 19:49:57,333 DEBUG [ProcedureExecutor-2] > lock.ZKInterProcessLockBase: Acquired a lock for > /hbase/table-lock/testTable/write-master:160000000000008 > 2016-11-09 19:49:57,335 DEBUG [hconnection-0x745317ee-shared--pool3-t11] > ipc.RpcClientImpl: Use SIMPLE authentication for service ClientService, > sasl=false > 2016-11-09 19:49:57,335 DEBUG [hconnection-0x745317ee-shared--pool3-t11] > ipc.RpcClientImpl: Connecting to host-1:16040 > 2016-11-09 19:49:57,347 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:57,449 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:57,579 INFO [ProcedureExecutor-2] > procedure.EnableTableProcedure: Attempting to enable the table testTable > 2016-11-09 19:49:57,580 INFO [ProcedureExecutor-2] > zookeeper.ZKTableStateManager: Moving table testTable state from DISABLED to > ENABLING > 2016-11-09 19:49:57,655 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:57,707 INFO [ProcedureExecutor-2] > procedure.EnableTableProcedure: Table 'testTable' has 1 regions, of which 1 > are offline. > 2016-11-09 19:49:57,707 INFO [ProcedureExecutor-2] > procedure.EnableTableProcedure: Bulk assigning 1 region(s) across 1 > server(s), retainAssignment=true > 2016-11-09 19:49:57,710 DEBUG [ProcedureExecutor-2] > master.GeneralBulkAssigner: Timeout-on-RIT=91000 > 2016-11-09 19:49:57,710 INFO > [host-1,16000,1478691456965-GeneralBulkAssigner-0] master.AssignmentManager: > Assigning 1 region(s) to host-1,16040,1478691644081 > {noformat} > . > . > . > {color:red} > 2016-11-09 19:49:57,718 DEBUG [AM.-pool1-t1] master.AssignmentManager: Force > region state offline {1890fa9c085dcc2ee0602f4bab069d10 state=OFFLINE, > ts=1478692197716, server=host-1,16040,1478690163056} > 2016-11-09 19:49:57,722 INFO [AM.-pool1-t1] master.AssignmentManager: Skip > assigning testTable,,1478689618299.1890fa9c085dcc2ee0602f4bab069d10., it is > on a dead but not processed yet server: host-1,16040,1478690163056 > {color} > {noformat} > 2016-11-09 19:49:57,957 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:58,459 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:59,462 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > {noformat} > . > . > . > {noformat} > 2016-11-09 19:51:38,720 DEBUG [ProcedureExecutor-2] > master.GeneralBulkAssigner: bulk assigning total 1 regions to 1 servers, took > 101010ms, with 1 regions still in transition > 2016-11-09 19:51:38,728 DEBUG [ProcedureExecutor-2] > procedure.EnableTableProcedure: Skipping assign for the region {ENCODED => > 1890fa9c085dcc2ee0602f4bab069d10, NAME => > 'testTable,,1478689618299.1890fa9c085dcc2ee0602f4bab069d10.', STARTKEY => '', > ENDKEY => ''} during enable table testTable because its already in tranition > or assigned. > 2016-11-09 19:51:38,728 INFO [ProcedureExecutor-2] > procedure.EnableTableProcedure: Table 'testTable' has 1 regions, of which 0 > are offline. > 2016-11-09 19:51:38,840 INFO [ProcedureExecutor-2] > zookeeper.ZKTableStateManager: Moving table testTable state from ENABLING to > ENABLED > {noformat} > {color:red} > 2016-11-09 19:51:38,846 INFO [ProcedureExecutor-2] > procedure.EnableTableProcedure: Table 'testTable' was successfully enabled. > {color} > {noformat} > 2016-11-09 19:51:39,081 DEBUG [ProcedureExecutor-2] > lock.ZKInterProcessLockBase: Released > /hbase/table-lock/testTable/write-master:160000000000008 > 2016-11-09 19:51:39,081 DEBUG [ProcedureExecutor-2] > procedure2.ProcedureExecutor: Procedure completed in 1mins, 41.898sec: > EnableTableProcedure (table=testTable) id=9 owner=seenu state=FINISHED > 2016-11-09 19:51:45,485 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:53:08,485 DEBUG > [RpcServer.reader=4,bindAddress=host-1,port=16000] ipc.RpcServer: > RpcServer.listener,port=16000: DISCONNECTING client host-1:40313 because read > count=-1. Number of active connections: 3 > 2016-11-09 19:53:45,504 DEBUG > [RpcServer.reader=3,bindAddress=host-1,port=16000] ipc.RpcServer: > RpcServer.listener,port=16000: DISCONNECTING client host-1:40281 because read > count=-1. Number of active connections: 2 > {noformat} > {color:red} > 2016-11-09 19:55:48,650 DEBUG [host-1,16000,1478691456965_ChoreService_1] > master.HMaster: Not running balancer because 1 region(s) in transition: > {1890fa9c085dcc2ee0602f4bab069d10={1890fa9c085dcc2ee0602f4bab069d10 > state=OFFLINE, ts=1478692197716, server=host-1,16040,1478690163056}} > {color} > {noformat} > 2016-11-09 19:55:48,650 WARN [host-1,16000,1478691456965_ChoreService_1] > master.CatalogJanitor: CatalogJanitor disabled! Not running scan. > 2016-11-09 19:57:09,336 DEBUG [ProcedureExecutorTimeout] > procedure2.ProcedureExecutor$CompletedProcedureCleaner: Evict completed > procedure: Procedure=EnableTableProcedure (table=testTable) (id=9, > owner=seenu, state=FINISHED, startTime=7mins, 12sec ago, lastUpdate=7mins, > 12sec ago) > {noformat} > . > . > . > {color:red} > 2016-11-09 20:10:48,408 DEBUG [host-1,16000,1478691456965_ChoreService_1] > master.HMaster: Not running balancer because 1 region(s) in transition: > {1890fa9c085dcc2ee0602f4bab069d10={1890fa9c085dcc2ee0602f4bab069d10 > state=OFFLINE, ts=1478692197716, server=host-1,16040,1478690163056}} > {color} > {panel} > region is continuously is in RIT, because new hmaster is trying to make the > region offline request sending to dead regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)