[jira] Commented: (HBASE-2223) Handle 10min+ network partitions between clusters

HBase Review Board (JIRA) Mon, 14 Jun 2010 12:09:38 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878703#action_12878703
 ]

HBase Review Board commented on HBASE-2223:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <[email protected]>

bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java,
 line 56
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1114#file1114line56>
bq.  >
bq.  >     For sure setConf will have been called before we get here?  So, 
stuff gets setup by setConf?  Can setConf be called more than once?  How do I 
know how to use this class?  Not doc'd.  Doesn't have a Constructor.

LogCleanerDelegate is the interface that defines the general behavior. Yes 
should have a constructor.

bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java,
 line 111
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1114#file1114line111>
bq.  >
bq.  >     The way this is done, if I didn't want to wait on the ttl, then I'd 
have to write a new class.  Can't we have ttl and recplication be distinct and 
then if I want delete based off ttl and whether log up in zk, then chain them?

I don't follow, chaining is already how I do it.

bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java,
 line 54
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1116#file1116line54>
bq.  >
bq.  >     I dont follow?

Yeah, RepSink is a mix of 2 solutions but only features the worst of both. The 
next patch will significantly make it better.

bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java,
 line 126
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1117#file1117line126>
bq.  >
bq.  >     This ain't a constructor?

I ain't.. but it's used like one.

bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java,
 line 483
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1117#file1117line483>
bq.  >
bq.  >     We have to copy?

This is the down side of the way of I'm caping the log entries by size or 
number. I'm reusing the same HLog.Entry[] entriesArray to read from HLogs (and 
the entries in it). For example, replicationQueueSizeCapacity=64MB and 
replicationQueueNbCapacity=25k. Let's say on a first run we reach 25k without 
reaching the size, so we'll replicate the whole array. Now on the second run 
let's say we reached 64MB after only 10k rows, then we only want to replicate 
that and not the 15k "leftovers".

bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java, 
line 67
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1123#file1123line67>
bq.  >
bq.  >     No dfs in this test.  Thats intentional?

Nope, should fix.

bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > 
src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java,
 line 86
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1124#file1124line86>
bq.  >
bq.  >     Can't you squash some of these tests together?  They each start up 
own minidfscluster... just start it once?

They don't?

  @Before
  public void setUp() throws Exception {
    table1 = TEST_UTIL.truncateTable(TABLE_NAME1);
    table2 = TEST_UTIL.truncateTable(TABLE_NAME2);
    Thread.sleep(SLEEP_TIME);
  }

- Jean-Daniel

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/76/#review194
-----------------------------------------------------------

> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>
>                 Key: HBASE-2223
>                 URL: https://issues.apache.org/jira/browse/HBASE-2223
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2223.patch
>
>
> We need a nice way of handling long network partitions without impacting a 
> master cluster (which pushes the data). Currently it will just retry over and 
> over again.
> I think we could:
>  - Stop replication to a slave cluster if it didn't respond for more than 10 
> minutes
>  - Keep track of the duration of the partition
>  - When the slave cluster comes back, initiate a MR job like HBASE-2221 
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or 
> just the first 2 parts. Discuss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2223) Handle 10min+ network partitions between clusters

Reply via email to