[
https://issues.apache.org/jira/browse/HBASE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240831#comment-15240831
]
Heng Chen commented on HBASE-15406:
-----------------------------------
I test it on cluster with 3 RS, hadoop version is 2.5.0
1. run {{bin/hbase hbck -abort -disableSplitAndMerge}}
{code}
HBaseFsck command line options: -abort -disableSplitAndMerge
2016-04-14 16:48:16,307 INFO [main] util.HBaseFsck: Launching hbck
2016-04-14 16:48:16,315 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
10.11.51.79/10.11.51.79:2181. Will not attempt to authenticate using SASL
(unknown error)
2016-04-14 16:48:16,360 INFO [main] zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x5b38c1ec connecting to ZooKeeper
ensemble=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
2016-04-14 16:48:16,360 INFO [main] zookeeper.ZooKeeper: Initiating client
connection,
connectString=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
sessionTimeout=90000 watcher=hconnection-0x5b38c1ec0x0,
quorum=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181,
baseZNode=/hbase-test-cluster-15406
2016-04-14 16:48:16,361 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Socket connection established to
10.11.51.79/10.11.51.79:2181, initiating session
2016-04-14 16:48:16,362 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
10.11.51.79/10.11.51.79:2181. Will not attempt to authenticate using SASL
(unknown error)
2016-04-14 16:48:16,362 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Socket connection established to
10.11.51.79/10.11.51.79:2181, initiating session
2016-04-14 16:48:16,368 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
10.11.51.79/10.11.51.79:2181, sessionid = 0x750c1c0af785fd7, negotiated timeout
= 40000
2016-04-14 16:48:16,368 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
10.11.51.79/10.11.51.79:2181, sessionid = 0x750c1c0af785fd8, negotiated timeout
= 40000
Version: 2.0.0-SNAPSHOT
Number of live region servers: 3
Number of dead region servers: 0
Master: dx-pipe-sata60-pm,16000,1460623655646
Number of backup masters: 0
Average load: 0.6666666666666666
Number of requests: 0
Number of regions: 2
Number of regions in transition: 0
2016-04-14 16:48:17,130 INFO [main] util.HBaseFsck: Loading regionsinfo from
the hbase:meta table
Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
2016-04-14 16:48:17,240 INFO [main] util.HBaseFsck: getHTableDescriptors ==
tableNames => []
2016-04-14 16:48:17,242 INFO [main] zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x3724af13 connecting to ZooKeeper
ensemble=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
2016-04-14 16:48:17,242 INFO [main] zookeeper.ZooKeeper: Initiating client
connection,
connectString=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
sessionTimeout=90000 watcher=hconnection-0x3724af130x0,
quorum=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181,
baseZNode=/hbase-test-cluster-15406
2016-04-14 16:48:17,245 INFO [main-SendThread(10.11.51.78:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
10.11.51.78/10.11.51.78:2181. Will not attempt to authenticate using SASL
(unknown error)
2016-04-14 16:48:17,245 INFO [main-SendThread(10.11.51.78:2181)]
zookeeper.ClientCnxn: Socket connection established to
10.11.51.78/10.11.51.78:2181, initiating session
2016-04-14 16:48:17,246 INFO [main-SendThread(10.11.51.78:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
10.11.51.78/10.11.51.78:2181, sessionid = 0x650c1c0cfd8b175, negotiated timeout
= 40000
2016-04-14 16:48:17,258 INFO [main] client.ConnectionImplementation: Closing
master protocol: MasterService
2016-04-14 16:48:17,258 INFO [main] client.ConnectionImplementation: Closing
zookeeper sessionid=0x650c1c0cfd8b175
2016-04-14 16:48:17,259 INFO [main] zookeeper.ZooKeeper: Session:
0x650c1c0cfd8b175 closed
Number of Tables: 0
2016-04-14 16:48:17,262 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down for session: 0x650c1c0cfd8b175
2016-04-14 16:48:17,340 INFO [main] util.HBaseFsck: Loading region directories
from HDFS
2016-04-14 16:48:17,449 INFO [main] util.HBaseFsck: Loading region information
from HDFS
2016-04-14 16:48:17,590 INFO [main] util.HBaseFsck: Checking and fixing region
consistency
2016-04-14 16:48:17,626 INFO [main] util.HBaseFsck: Handling overlap merges in
parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
2016-04-14 16:48:17,633 INFO [main] util.HBaseFsck: Abort hbck!!!
2016-04-14 16:48:17,639 INFO [Thread-4] zookeeper.ZooKeeper: Session:
0x750c1c0af785fd7 closed
2016-04-14 16:48:17,639 INFO [Thread-4] client.ConnectionImplementation:
Closing master protocol: MasterService
2016-04-14 16:48:17,640 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down for session: 0x750c1c0af785fd7
2016-04-14 16:48:17,640 INFO [Thread-4] client.ConnectionImplementation:
Closing zookeeper sessionid=0x750c1c0af785fd8
2016-04-14 16:48:17,642 INFO [Thread-4] zookeeper.ZooKeeper: Session:
0x750c1c0af785fd8 closed
2016-04-14 16:48:17,643 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down for session: 0x750c1c0af785fd8
{code}
2. open the shell, try {{splitormerge_switch}} command
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14
16:38:25 CST 2016
hbase(main):001:0> splitormerge_switch 'SPLIT', true
ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: can't set splitOrMerge
switch due to lock
at
org.apache.hadoop.hbase.master.MasterRpcServices.setSplitOrMergeEnabled(MasterRpcServices.java:1501)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61521)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2250)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:137)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:112)
at java.lang.Thread.run(Thread.java:745)
Here is some help for this command:
Enable/Disable one switch. You can set switch type 'SPLIT' or 'MERGE'. Returns
previous split state.
Examples:
hbase> splitormerge_switch 'SPLIT', true
hbase> splitormerge_switch 'SPLIT', false
nil
hbase(main):002:0>
{code}
Try {{splitormerge_enabled 'SPLIT'}}, you will see
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14
16:38:25 CST 2016
hbase(main):001:0> splitormerge_enabled 'SPLIT'
false
0 row(s) in 0.2940 seconds
{code}
3. Rerun {{bin/hbase hbck -disableSplitAndMerge}}
{code}
= 0x750c1c0af7866fa, negotiated timeout = 40000
Version: 2.0.0-SNAPSHOT
Number of live region servers: 3
Number of dead region servers: 0
Master: dx-pipe-sata60-pm,16000,1460623655646
Number of backup masters: 0
Average load: 0.6666666666666666
Number of requests: 0
Number of regions: 2
Number of regions in transition: 0
2016-04-14 16:53:44,459 INFO [main] util.HBaseFsck: Loading regionsinfo from
the hbase:meta table
Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
2016-04-14 16:53:44,559 INFO [main] util.HBaseFsck: getHTableDescriptors ==
tableNames => [hbase:namespace]
2016-04-14 16:53:44,560 INFO [main] zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x3724af13 connecting to ZooKeeper
ensemble=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
2016-04-14 16:53:44,560 INFO [main] zookeeper.ZooKeeper: Initiating client
connection,
connectString=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181
sessionTimeout=90000 watcher=hconnection-0x3724af130x0,
quorum=dx-pipe-zk1-online:2181,dx-pipe-zk2-online:2181,dx-pipe-zk3-online:2181,dx-pipe-zk4-online:2181,dx-pipe-zk5-online:2181,
baseZNode=/hbase-test-cluster-15406
2016-04-14 16:53:44,562 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
10.11.51.79/10.11.51.79:2181. Will not attempt to authenticate using SASL
(unknown error)
2016-04-14 16:53:44,563 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Socket connection established to
10.11.51.79/10.11.51.79:2181, initiating session
2016-04-14 16:53:44,564 INFO [main-SendThread(10.11.51.79:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
10.11.51.79/10.11.51.79:2181, sessionid = 0x750c1c0af7866fb, negotiated timeout
= 40000
2016-04-14 16:53:44,578 INFO [main] client.ConnectionImplementation: Closing
master protocol: MasterService
2016-04-14 16:53:44,579 INFO [main] client.ConnectionImplementation: Closing
zookeeper sessionid=0x750c1c0af7866fb
2016-04-14 16:53:44,580 INFO [main] zookeeper.ZooKeeper: Session:
0x750c1c0af7866fb closed
Number of Tables: 1
2016-04-14 16:53:44,583 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down for session: 0x750c1c0af7866fb
2016-04-14 16:53:44,599 INFO [main] util.HBaseFsck: Loading region directories
from HDFS
2016-04-14 16:53:44,693 INFO [main] util.HBaseFsck: Loading region information
from HDFS
2016-04-14 16:53:44,879 INFO [main] util.HBaseFsck: Checking and fixing region
consistency
2016-04-14 16:53:44,914 INFO [main] util.HBaseFsck: Handling overlap merges in
parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
2016-04-14 16:53:44,929 INFO [main] util.HBaseFsck: Computing mapping of all
store files
2016-04-14 16:53:44,946 INFO [main] util.HBaseFsck: Validating mapping using
HDFS state
Summary:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: dx-pipe-sata60-pm,16000,1460623655646
Table hbase:namespace is okay.
Number of regions: 1
Deployed on: dx-pipe-sata60-pm,16000,1460623655646
0 inconsistencies detected.
Status: OK
2016-04-14 16:53:44,992 INFO [main] zookeeper.ZooKeeper: Session:
0x750c1c0af7866fa closed
2016-04-14 16:53:44,993 INFO [main] client.ConnectionImplementation: Closing
master protocol: MasterService
2016-04-14 16:53:44,993 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down for session: 0x750c1c0af7866fa
2016-04-14 16:53:44,994 INFO [main] client.ConnectionImplementation: Closing
zookeeper sessionid=0x550c1c0af77096e
2016-04-14 16:53:44,996 INFO [main] zookeeper.ZooKeeper: Session:
0x550c1c0af77096e closed
2016-04-14 16:53:44,996 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down for session: 0x550c1c0af77096e
{code}
4. try {{splitormerge_enabled}} command, you will see the switch set back
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14
16:38:25 CST 2016
hbase(main):001:0> splitormerge_enabled 'SPLIT'
true
0 row(s) in 0.3910 seconds
hbase(main):002:0>
{code}
try {{splitormerge_switch 'SPLIT', true}}, there is no lock any more.
{code}
[maintain@dx-pipe-sata60-pm hbase-2.0.0-SNAPSHOT]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hbase/hbase-2.0.0-SNAPSHOT/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/maintain/hadoop/hadoop-2.5.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 2.0.0-SNAPSHOT, r751cee2c5fa87ea15e4132606fa23e70a479c336, Thu Apr 14
16:38:25 CST 2016
hbase(main):001:0> splitormerge_switch 'SPLIT', true
true
0 row(s) in 0.3060 seconds
hbase(main):002:0>
{code}
> Split / merge switch left disabled after early termination of hbck
> ------------------------------------------------------------------
>
> Key: HBASE-15406
> URL: https://issues.apache.org/jira/browse/HBASE-15406
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-15406.patch, HBASE-15406.v1.patch,
> HBASE-15406_v1.patch, HBASE-15406_v2.patch, test.patch, wip.patch
>
>
> This was what I did on cluster with 1.4.0-SNAPSHOT built Thursday:
> Run 'hbase hbck -disableSplitAndMerge' on gateway node of the cluster
> Terminate hbck early
> Enter hbase shell where I observed:
> {code}
> hbase(main):001:0> splitormerge_enabled 'SPLIT'
> false
> 0 row(s) in 0.3280 seconds
> hbase(main):002:0> splitormerge_enabled 'MERGE'
> false
> 0 row(s) in 0.0070 seconds
> {code}
> Expectation is that the split / merge switches should be restored to default
> value after hbck exits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)