[
https://issues.apache.org/jira/browse/HBASE-12150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-12150:
-----------------------------------
Attachment: HBASE-12150.patch
Proposal for 0.98.
Setting Patch Available but Jenkins will fail to apply for precommit. Unit test
results on 0.98:
{noformat}
Running org.apache.hadoop.hbase.protobuf.TestReplicationProtobuf
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.562 sec
Running org.apache.hadoop.hbase.client.replication.TestReplicationAdmin
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.378 sec
Running
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.746 sec
Running org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.767 sec
Running org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.641 sec
Running org.apache.hadoop.hbase.replication.TestReplicationStateZKImpl
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.264 sec
Running
org.apache.hadoop.hbase.replication.TestReplicationChangingPeerRegionservers
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.597 sec
Running org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.466 sec
Running org.apache.hadoop.hbase.replication.TestPerTableCFReplication
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 41.392 sec
Running org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.983 sec
Running
org.apache.hadoop.hbase.replication.regionserver.TestReplicationThrottler
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.558 sec
Running org.apache.hadoop.hbase.replication.regionserver.TestReplicationSink
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.779 sec
Running
org.apache.hadoop.hbase.replication.regionserver.TestReplicationSourceManager
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.123 sec
Running
org.apache.hadoop.hbase.replication.regionserver.TestReplicationHLogReaderManager
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 109.035 sec
Running
org.apache.hadoop.hbase.replication.regionserver.TestReplicationSinkManager
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.371 sec
Running org.apache.hadoop.hbase.replication.TestReplicationSource
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.888 sec
Running org.apache.hadoop.hbase.replication.TestMasterReplication
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 61.641 sec
Running org.apache.hadoop.hbase.replication.TestReplicationTrackerZKImpl
Tests run: 4, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 2.163 sec
Running org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.789 sec
Running org.apache.hadoop.hbase.replication.TestReplicationSmallTests
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 57.796 sec
Running org.apache.hadoop.hbase.replication.TestReplicationWithTags
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.179 sec
Results :
Tests run: 59, Failures: 0, Errors: 0, Skipped: 2
{noformat}
> Backport replication changes from HBASE-12145
> ---------------------------------------------
>
> Key: HBASE-12150
> URL: https://issues.apache.org/jira/browse/HBASE-12150
> Project: HBase
> Issue Type: Task
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Fix For: 0.98.7
>
> Attachments: HBASE-12150.patch
>
>
> HBASE-12145 makes all zk accesses synchronized in RecoverableZooKeeper in
> branch-1 +:
> {code}
> @@ -690,23 +692,23 @@ public class RecoverableZooKeeper {
> return newData;
> }
>
> - public long getSessionId() {
> - return zk == null ? null : zk.getSessionId();
> + public synchronized long getSessionId() {
> + return zk == null ? -1 : zk.getSessionId();
> }
>
> - public void close() throws InterruptedException {
> + public synchronized void close() throws InterruptedException {
> if (zk != null) zk.close();
> }
>
> - public States getState() {
> + public synchronized States getState() {
> return zk == null ? null : zk.getState();
> }
>
> - public ZooKeeper getZooKeeper() {
> + public synchronized ZooKeeper getZooKeeper() {
> return zk;
> }
>
> - public byte[] getSessionPasswd() {
> + public synchronized byte[] getSessionPasswd() {
> return zk == null ? null : zk.getSessionPasswd();
> }
> {code}
> It also makes this change:
> {code}
> @@ -391,8 +390,14 @@ public class ReplicationPeersZKImpl extends
> ReplicationStateZKBase implements Re
> if (peer == null) {
> return false;
> }
> - ((ConcurrentMap<String, ReplicationPeerZKImpl>)
> peerClusters).putIfAbsent(peerId, peer);
> - LOG.info("Added new peer cluster " +
> peer.getPeerConfig().getClusterKey());
> + ReplicationPeerZKImpl previous =
> + ((ConcurrentMap<String, ReplicationPeerZKImpl>)
> peerClusters).putIfAbsent(peerId, peer);
> + if (previous == null) {
> + LOG.info("Added new peer cluster=" +
> peer.getPeerConfig().getClusterKey());
> + } else {
> + LOG.info("Peer already present, " +
> previous.getPeerConfig().getClusterKey() +
> + ", new cluster=" + peer.getPeerConfig().getClusterKey());
> + }
> return true;
> }
> {code}
> We should keep the 0.98 code in sync with these changes because these affect
> correctness. Would like to avoid "this change works in branch-1 or master but
> breaks in some weird way in 0.98" issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)