[jira] [Commented] (HBASE-18363) Hbck option to undeploy in memory replica parent region

huaxiang sun (JIRA) Tue, 18 Jul 2017 18:34:38 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092454#comment-16092454
 ]


huaxiang sun commented on HBASE-18363:
--------------------------------------

I checked the hbck code, "-fixAssignments" should be able to fix this in-memory 
state. I simulated this case
{code}
2017-07-18 18:19:10,192 INFO  [main-EventThread] zookeeper.ClientCnxn: 
EventThread shut down
2017-07-18 18:19:10,192 INFO  [main] zookeeper.ZooKeeper: Session: 
0x15d5869d2f50014 closed
2017-07-18 18:19:10,192 INFO  [main] util.HBaseFsck: Checking and fixing region 
consistency
*ERROR: Region { meta => null, hdfs => null, deployed => 
dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285.,
 replicaId => 1 } not in META, but deployed on 
dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
ERROR: No regioninfo in Meta or HDFS. { meta => null, hdfs => null, deployed => 
dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520;t1,r1,1500328224175_0001.d761ef3cc03d8a0124bb751f216f9285.,
 replicaId => 1 }*
2017-07-18 18:19:10,200 INFO  [main] util.HBaseFsck: Handling overlap merges in 
parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
2017-07-18 18:19:10,205 INFO  [main] util.HBaseFsck: Computing mapping of all 
store files

2017-07-18 18:19:10,214 INFO  [main] util.HBaseFsck: Validating mapping using 
HDFS state
2017-07-18 18:19:10,220 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,220 INFO  [main] zookeeper.ZooKeeper: Initiating client 
connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase 
Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,221 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown 
error)
2017-07-18 18:19:10,222 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Socket connection established, initiating session, 
client: /127.0.0.1:60970, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,223 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session establishment complete on server 
localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50016, negotiated timeout = 
40000
2017-07-18 18:19:10,230 INFO  [main-EventThread] zookeeper.ClientCnxn: 
EventThread shut down
2017-07-18 18:19:10,230 INFO  [main] zookeeper.ZooKeeper: Session: 
0x15d5869d2f50016 closed
2017-07-18 18:19:10,231 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,231 INFO  [main] zookeeper.ZooKeeper: Initiating client 
connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase 
Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,232 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown 
error)
2017-07-18 18:19:10,233 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Socket connection established, initiating session, 
client: /127.0.0.1:60971, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,234 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session establishment complete on server 
localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50017, negotiated timeout = 
40000
2017-07-18 18:19:10,236 INFO  [main-EventThread] zookeeper.ClientCnxn: 
EventThread shut down
2017-07-18 18:19:10,236 INFO  [main] zookeeper.ZooKeeper: Session: 
0x15d5869d2f50017 closed
2017-07-18 18:19:10,236 INFO  [main] zookeeper.RecoverableZooKeeper: Process 
identifier=hbase Fsck connecting to ZooKeeper ensemble=localhost:2181
2017-07-18 18:19:10,236 INFO  [main] zookeeper.ZooKeeper: Initiating client 
connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hbase 
Fsck0x0, quorum=localhost:2181, baseZNode=/hbase
2017-07-18 18:19:10,238 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown 
error)
2017-07-18 18:19:10,238 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Socket connection established, initiating session, 
client: /127.0.0.1:60972, server: localhost/127.0.0.1:2181
2017-07-18 18:19:10,239 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session establishment complete on server 
localhost/127.0.0.1:2181, sessionid = 0x15d5869d2f50018, negotiated timeout = 
40000
2017-07-18 18:19:10,258 INFO  [main] zookeeper.ZooKeeper: Session: 
0x15d5869d2f50018 closed
Summary:2017-07-18 18:19:10,258 INFO  [main-EventThread] zookeeper.ClientCnxn: 
EventThread shut down

Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table t1 is okay.
    Number of regions: 4
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table hbase:quota is okay.
    Number of regions: 1
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
Table hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  dhcp-172-16-1-203.pa.cloudera.com,60863,1500426918520
1 inconsistencies detected.

{code}

I was able to fix this issue by running "hbase hbck -fixAssignments".

Resolve it as invalid.

> Hbck option to undeploy in memory replica parent region 
> --------------------------------------------------------
>
>                 Key: HBASE-18363
>                 URL: https://issues.apache.org/jira/browse/HBASE-18363
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck
>    Affects Versions: 1.4.0, 2.0.0-alpha-1
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>            Priority: Minor
>
> We run into cases that with read replica, after split, sometimes, the parent 
> replica region is left in  master's in memory onlineRegion list. This results 
> in the region got assigned to a region server. Though the root cause will be 
> fixed by HBASE-18025. We need to enhance hbck tool to fix this in-memory 
> state. Currently, hbck only allows the fix for primary region (in this case, 
> the primary region is gone) with fixAssignment option, please see the 
> following line of code. We will enhance it so it can be applied to replica 
> region as well.
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java#L2216



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18363) Hbck option to undeploy in memory replica parent region

Reply via email to