[
https://issues.apache.org/jira/browse/HBASE-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092454#comment-16092454
]
huaxiang sun commented on HBASE-18363:
--------------------------------------
I checked the hbck code, "-fixAssignments" should be able to fix this in-memory
state. I simulated this case
{code}
2017-07-18 18:19:10,192 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down
2017-07-18 18:19:10,192 INFO [main] zookeeper.ZooKeeper: Session:
0x15d5869d2f50014 closed
2017-07-18 18:19:10,192 INFO [main] util.HBaseFsck: Checking and fixing region
consistency
*ERROR: Region { meta => null, hdfs => null, deployed =>
dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285.,
replicaId => 1 } not in META, but deployed on
dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
ERROR: No regioninfo in Meta or HDFS. { meta => null, hdfs => null, deployed =>
dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285.,
replicaId => 1 }*
2017-07-18 18:19:10,200 INFO [main] util.HBaseFsck: Handling overlap merges in
parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
2017-07-18 18:19:10,205 INFO [main] util.HBaseFsck: Computing mapping of all
store files
2017-07-18 18:19:10,214 INFO [main] util.HBaseFsck: Validating mapping using
HDFS state
2017-07-18 18:19:10,220 INFO [main] zookeeper.RecoverableZooKeeper: Process
identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,220 INFO [main] zookeeper.ZooKeeper: Initiating client
connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase
Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,221 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown
error)
2017-07-18 18:19:10,222 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Socket connection established, initiating session,
client: /127.0.0.1:60970, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,223 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50016, negotiated timeout =
40000
2017-07-18 18:19:10,230 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down
2017-07-18 18:19:10,230 INFO [main] zookeeper.ZooKeeper: Session:
0x15d5869d2f50016 closed
2017-07-18 18:19:10,231 INFO [main] zookeeper.RecoverableZooKeeper: Process
identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,231 INFO [main] zookeeper.ZooKeeper: Initiating client
connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase
Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,232 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown
error)
2017-07-18 18:19:10,233 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Socket connection established, initiating session,
client: /127.0.0.1:60971, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,234 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50017, negotiated timeout =
40000
2017-07-18 18:19:10,236 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down
2017-07-18 18:19:10,236 INFO [main] zookeeper.ZooKeeper: Session:
0x15d5869d2f50017 closed
2017-07-18 18:19:10,236 INFO [main] zookeeper.RecoverableZooKeeper: Process
identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,236 INFO [main] zookeeper.ZooKeeper: Initiating client
connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase
Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,238 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown
error)
2017-07-18 18:19:10,238 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Socket connection established, initiating session,
client: /127.0.0.1:60972, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,239 INFO [main-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50018, negotiated timeout =
40000
2017-07-18 18:19:10,258 INFO [main] zookeeper.ZooKeeper: Session:
0x15d5869d2f50018 closed
Summary:2017-07-18 18:19:10,258 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down
Table hbase:meta is okay.
Number of regions: 1
Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table t1 is okay.
Number of regions: 4
Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table hbase:quota is okay.
Number of regions: 1
Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table hbase:namespace is okay.
Number of regions: 1
Deployed on: dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
1 inconsistencies detected.
{code}
I was able to fix this issue by running "hbase hbck -fixAssignments".
Resolve it as invalid.
> Hbck option to undeploy in memory replica parent region
> --------------------------------------------------------
>
> Key: HBASE-18363
> URL: https://issues.apache.org/jira/browse/HBASE-18363
> Project: HBase
> Issue Type: Bug
> Components: hbck
> Affects Versions: 1.4.0, 2.0.0-alpha-1
> Reporter: huaxiang sun
> Assignee: huaxiang sun
> Priority: Minor
>
> We run into cases that with read replica, after split, sometimes, the parent
> replica region is left in master's in memory onlineRegion list. This results
> in the region got assigned to a region server. Though the root cause will be
> fixed by HBASE-18025. We need to enhance hbck tool to fix this in-memory
> state. Currently, hbck only allows the fix for primary region (in this case,
> the primary region is gone) with fixAssignment option, please see the
> following line of code. We will enhance it so it can be applied to replica
> region as well.
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java#L2216
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)