[jira] Commented: (HBASE-2223) Handle 10min+ network partitions between clusters

HBase Review Board (JIRA) Wed, 26 May 2010 18:12:09 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872051#action_12872051
 ]


HBase Review Board commented on HBASE-2223:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/76/
-----------------------------------------------------------

(Updated 2010-05-26 18:09:30.362592)


Review request for hbase.


Changes
-------

This new patch takes care of almost all comments except:
ReplicationZookeeperHelper.java
- use a <pre> block to make this more readable in the HTML version of the 
javadoc.

ReplicationSink.java
- I think it would be good to document the fact that this method will typically 
be called from another thread than the thread that executes `run' so that other 
people reading the code will quickly get a good grasp of what are the 
concurrency / locking requirements.

- So Delete operations are "unbuffered" unlike Put operations, which you 
"buffer" in the `puts' list.  Does that mean that a Delete can be executed 
before the Put that was creating the data in the first place, and that the 
Delete will fail first and the Put will survive second?

// Should we log rejected edits in a file for replay?
- I vote yes


The major change I did was removing ReplicationConnectionManager and using HCM 
directly since it was the same code (so the comments left by Benoit still 
apply, but to HCM). Other than that it's mostly refactoring and fixing nits.


Summary
-------

This is HBASE-2223 AKA Replication 2.0, it is currently only a "preview patch" 
as it's pretty much feature complete, works on a cluster, has unit tests and 
whatnot, but it could use a lot more testing and cleaning and ideas from others.


This addresses bug HBASE-2223.
    http://issues.apache.org/jira/browse/HBASE-2223


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 13aff26 
  src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a 
  src/main/java/org/apache/hadoop/hbase/master/ServerManager.java a197b8f 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java b5ff43a 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 12a3cd8 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 7c1184c 
  
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java
 PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
 PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/replication/package.html PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
 PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
 PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
 PRE-CREATION 
  
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f 
  src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java 
PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java
 PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
 PRE-CREATION 

Diff: http://review.hbase.org/r/76/diff


Testing
-------


Thanks,

Jean-Daniel




> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>
>                 Key: HBASE-2223
>                 URL: https://issues.apache.org/jira/browse/HBASE-2223
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2223.patch
>
>
> We need a nice way of handling long network partitions without impacting a 
> master cluster (which pushes the data). Currently it will just retry over and 
> over again.
> I think we could:
>  - Stop replication to a slave cluster if it didn't respond for more than 10 
> minutes
>  - Keep track of the duration of the partition
>  - When the slave cluster comes back, initiate a MR job like HBASE-2221 
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or 
> just the first 2 parts. Discuss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2223) Handle 10min+ network partitions between clusters

Reply via email to