[
https://issues.apache.org/jira/browse/HBASE-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack resolved HBASE-17570.
---------------------------
Resolution: Duplicate
Fixed by HBASE-17350
> rsgroup server move can get stuck if unassigning fails
> ------------------------------------------------------
>
> Key: HBASE-17570
> URL: https://issues.apache.org/jira/browse/HBASE-17570
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver
> Reporter: stack
> Fix For: 2.0.0
>
>
> This is pretty easy to repro in a standalone setup on master branch. Master
> branch has the 'fake' Master regionserver. It is showing as a regionserver in
> the rsgroup 'default' group. If I create a new group and then try moving
> servers to the new group, it will usually get stuck in the below loop... and
> it will never break out (have to kill master).
> Looking at code, the RSGroupAdminServer#moveServers has a loop in it that
> will just go on for ever; there is no timeout nor maximum tries.
> Maybe we don't see this much in a 'real' cluster. Filing this issue in
> meantime because needs to not keep trying for ever and fail the move.
> {code}
> 2017-01-30 21:34:46,340 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> rsgroup.RSGroupAdminServer: Unassigning 1 regions from server localhost:50143
> for move to xx
> 2017-01-30 21:34:46,341 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=OPEN,
> ts=1485840806167, server=localhost,50143,1485840800161} to
> {8ebaa5bd7a2e906429a7b91bb2bee333 state=PENDING_CLOSE, ts=1485840886341,
> server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,341 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStateStore: Updating hbase:meta row
> hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with
> state=PENDING_CLOSE
> 2017-01-30 21:34:46,347 INFO
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50143]
> regionserver.RSRpcServices: Close 8ebaa5bd7a2e906429a7b91bb2bee333 without
> moving
> 2017-01-30 21:34:46,348 INFO [RS_CLOSE_REGION-localhost:50143-0]
> regionserver.HRegion: Flushing 1/1 column families, memstore=431 B
> 2017-01-30 21:34:46,406 INFO [RS_CLOSE_REGION-localhost:50143-0]
> regionserver.DefaultStoreFlusher: Flushed, sequenceid=7, memsize=431,
> hasBloomFilter=true, into tmp file
> file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/.tmp/m/999d93adf36b4406bb73dc64e0158a05
> 2017-01-30 21:34:46,422 INFO [RS_CLOSE_REGION-localhost:50143-0]
> regionserver.HStore: Added
> file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/m/999d93adf36b4406bb73dc64e0158a05,
> entries=2, sequenceid=7, filesize=4.9 K
> 2017-01-30 21:34:46,422 INFO [RS_CLOSE_REGION-localhost:50143-0]
> regionserver.HRegion: Finished memstore flush of ~431 B/431, currentsize=0
> B/0 for region hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
> in 74ms, sequenceid=7, compaction requested=false
> 2017-01-30 21:34:46,425 INFO
> [StoreCloserThread-hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.-1]
> regionserver.HStore: Closed m
> 2017-01-30 21:34:46,437 INFO [RS_CLOSE_REGION-localhost:50143-0]
> regionserver.HRegion: Closed
> hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
> 2017-01-30 21:34:46,440 INFO
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141]
> master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333
> state=PENDING_CLOSE, ts=1485840886341, server=localhost,50143,1485840800161}
> to {8ebaa5bd7a2e906429a7b91bb2bee333 state=CLOSED, ts=1485840886440,
> server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,440 INFO
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141]
> master.RegionStateStore: Updating hbase:meta row
> hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with
> state=CLOSED
> 2017-01-30 21:34:46,442 WARN [AM.-pool3-t1] balancer.BaseLoadBalancer:
> Wanted to do retain assignment but no servers to assign to
> 2017-01-30 21:34:46,442 WARN [AM.-pool3-t1] master.AssignmentManager: Can't
> find a destination for 8ebaa5bd7a2e906429a7b91bb2bee333
> 2017-01-30 21:34:46,442 WARN [AM.-pool3-t1] master.AssignmentManager: Unable
> to determine a plan to assign {ENCODED => 8ebaa5bd7a2e906429a7b91bb2bee333,
> NAME => 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.',
> STARTKEY => '', ENDKEY => ''}
> 2017-01-30 21:34:46,442 WARN [AM.-pool3-t1] master.RegionStates: Failed to
> open/close 8ebaa5bd7a2e906429a7b91bb2bee333 on localhost,50143,1485840800161,
> set to FAILED_OPEN
> 2017-01-30 21:34:46,442 INFO [AM.-pool3-t1] master.RegionStates: Transition
> {8ebaa5bd7a2e906429a7b91bb2bee333 state=CLOSED, ts=1485840886440,
> server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333
> state=FAILED_OPEN, ts=1485840886442, server=localhost,50143,1485840800161}
> 2017-01-30 21:34:46,442 INFO [AM.-pool3-t1] master.RegionStateStore:
> Updating hbase:meta row
> hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with
> state=FAILED_OPEN
> 2017-01-30 21:34:46,990 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
> server.NIOServerCnxnFactory: Accepted socket connection from
> /0:0:0:0:0:0:0:1:50272
> 2017-01-30 21:34:46,990 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
> server.ZooKeeperServer: Refusing session request for client
> /0:0:0:0:0:0:0:1:50272 as it has seen zxid 0x25e our last zxid is 0xae client
> must try another server
> 2017-01-30 21:34:46,990 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
> server.NIOServerCnxn: Closed socket connection for client
> /0:0:0:0:0:0:0:1:50272 (no session established for client)
> 2017-01-30 21:34:47,353 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> rsgroup.RSGroupAdminServer: Unassigning 2 regions from server localhost:50143
> for move to xx
> 2017-01-30 21:34:47,353 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333
> state=FAILED_OPEN, ts=1485840886442, server=localhost,50143,1485840800161} to
> {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887353,
> server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,353 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStateStore: Updating hbase:meta row
> hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with
> state=OFFLINE
> 2017-01-30 21:34:47,355 WARN
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> balancer.BaseLoadBalancer: Wanted to do retain assignment but no servers to
> assign to
> 2017-01-30 21:34:47,355 WARN
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.AssignmentManager: Can't find a destination for
> 8ebaa5bd7a2e906429a7b91bb2bee333
> 2017-01-30 21:34:47,355 WARN
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.AssignmentManager: Unable to determine a plan to assign {ENCODED =>
> 8ebaa5bd7a2e906429a7b91bb2bee333, NAME =>
> 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.', STARTKEY =>
> '', ENDKEY => ''}
> 2017-01-30 21:34:47,355 WARN
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStates: Failed to open/close 8ebaa5bd7a2e906429a7b91bb2bee333 on
> localhost,50143,1485840800161, set to FAILED_OPEN
> 2017-01-30 21:34:47,355 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333
> state=OFFLINE, ts=1485840887353, server=localhost,50143,1485840800161} to
> {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840887355,
> server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,355 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStateStore: Updating hbase:meta row
> hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with
> state=FAILED_OPEN
> 2017-01-30 21:34:47,356 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333
> state=FAILED_OPEN, ts=1485840887355, server=localhost,50143,1485840800161} to
> {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887356,
> server=localhost,50143,1485840800161}
> 2017-01-30 21:34:47,356 INFO
> [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141]
> master.RegionStateStore: Updating hbase:meta row
> hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with
> state=OFFLINE
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)