Siddharth Wagle created AMBARI-15361:
----------------------------------------

             Summary: Fix ordering of Alter table calls which could result in 
Region Close issue
                 Key: AMBARI-15361
                 URL: https://issues.apache.org/jira/browse/AMBARI-15361
             Project: Ambari
          Issue Type: Bug
          Components: ambari-metrics
    Affects Versions: 2.2.0
            Reporter: Siddharth Wagle
            Assignee: Siddharth Wagle
            Priority: Critical
             Fix For: 2.2.2


Two Alter table calls one of TTL and the other for setting the Region policies 
could result in a possible race condition on HBase side. 

Here is some details:
The assignment operation on the region in question 
({{2e13a2d264c50e27c32cd4d3cad6ebca}}) further indicates possible network 
partition - the region is CLOSED and OFFLINED successfully, and trying to OPEN 
again, but it found conflict information from zk node (the zk node had a stale 
information and still thought that it is still in the CLOSING state, which is 
not true based on the log) - the assignment succeeds, but updating the 
assignment information to zk failed due to incorrect version in the zk node.  
{noformat}
2016-02-18 00:05:46,603 INFO  
[os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,52093,1455753905161-org.apache.hadoop.hbase.master.BulkReOpen-0]
 master.RegionStates: Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=OPEN, 
ts=1455753935244, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to 
{2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_CLOSE, ts=1455753946603, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
2016-02-18 00:05:46,605 INFO  [PriorityRpcServer.handler=16,queue=0,port=41497] 
regionserver.RSRpcServices: Close 2e13a2d264c50e27c32cd4d3cad6ebca, moving to 
null
2016-02-18 00:05:46,629 INFO  
[StoreCloserThread-METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.-1]
 regionserver.HStore: Closed 0

2016-02-18 00:05:46,635 INFO  
[RS_CLOSE_REGION-os-s11-3-zxgfju-ambari-hv-ser-ha-3-1:41497-1] 
regionserver.HRegion: Closed 
METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.
2016-02-18 00:05:46,643 INFO  [AM.ZK.Worker-pool3-t1] master.RegionStates: 
Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_CLOSE, 
ts=1455753946603, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to 
{2e13a2d264c50e27c32cd4d3cad6ebca state=CLOSED, ts=1455753946643, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
2016-02-18 00:05:46,644 INFO  [AM.ZK.Worker-pool3-t1] master.AssignmentManager: 
Setting node as OFFLINED in ZooKeeper for region {ENCODED => 
2e13a2d264c50e27c32cd4d3cad6ebca, NAME => 
'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY => 
'', ENDKEY => ''}
2016-02-18 00:05:46,644 INFO  [AM.ZK.Worker-pool3-t1] master.RegionStates: 
Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=CLOSED, ts=1455753946643, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to 
{2e13a2d264c50e27c32cd4d3cad6ebca state=OFFLINE, ts=1455753946644, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
2016-02-18 00:05:46,646 INFO  [AM.ZK.Worker-pool3-t1] master.AssignmentManager: 
Assigning METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. to 
os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723
2016-02-18 00:05:46,646 INFO  [AM.ZK.Worker-pool3-t1] master.RegionStates: 
Transition {2e13a2d264c50e27c32cd4d3cad6ebca state=OFFLINE, ts=1455753946644, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723} to 
{2e13a2d264c50e27c32cd4d3cad6ebca state=PENDING_OPEN, ts=1455753946646, 
server=os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723}
2016-02-18 00:05:46,647 INFO  [PriorityRpcServer.handler=5,queue=1,port=41497] 
regionserver.RSRpcServices: Open 
METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.
2016-02-18 00:05:46,647 INFO  [PriorityRpcServer.handler=5,queue=1,port=41497] 
coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 
2e13a2d264c50e27c32cd4d3cad6ebca, NAME => 
'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY => 
'', ENDKEY => ''} failed, transitioning from OFFLINE to FAILED_OPEN in ZK, 
expecting version 2
2016-02-18 00:05:46,649 WARN  [PriorityRpcServer.handler=5,queue=1,port=41497] 
regionserver.RSRpcServices: Failed opening region 
METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: 
Received OPEN for the 
region:METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. , which 
we are already trying to CLOSE 
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1477)
        at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)
2016-02-18 00:05:46,653 WARN  [AM.ZK.Worker-pool3-t1] master.AssignmentManager: 
Failed assignment of 
METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca. to 
os-s11-3-zxgfju-ambari-hv-ser-ha-3-1.novalocal,41497,1455753908723, waiting a 
little before trying on the same region server try=1 of 10

...

2016-02-18 00:05:46,771 WARN  
[RS_OPEN_REGION-os-s11-3-zxgfju-ambari-hv-ser-ha-3-1:41497-0] 
coordination.ZkOpenRegionCoordination: Unable to mark region {ENCODED => 
2e13a2d264c50e27c32cd4d3cad6ebca, NAME => 
'METRIC_RECORD,,1455753934664.2e13a2d264c50e27c32cd4d3cad6ebca.', STARTKEY => 
'', ENDKEY => ''} as FAILED_OPEN. It's likely that the master already timed out 
this open attempt, and thus another RS already has the region.


016-02-18 00:05:47,038 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] 
server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x152f1b1bbcb0014, likely client has closed socket
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
2016-02-18 00:05:47,039 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:61181] 
server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:51677 
which had sessionid 0x152f1b1bbcb0014
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to