[
https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878692#action_12878692
]
HBase Review Board commented on HBASE-2223:
-------------------------------------------
Message from: "Jean-Daniel Cryans" <[email protected]>
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > bin/replication/add_peer.rb, line 21
bq. > <http://review.hbase.org/r/76/diff/5/?file=1104#file1104line21>
bq. >
bq. > Should you point at some replication documentation here? Is there
such a thing?
package.html later, should I point to it?
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > bin/replication/copy_tables_desc.rb, line 58
bq. > <http://review.hbase.org/r/76/diff/5/?file=1105#file1105line58>
bq. >
bq. > This could get a bit annoying I'd say.
It helped me a lot, remove if people complain?
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 342
bq. > <http://review.hbase.org/r/76/diff/5/?file=1107#file1107line342>
bq. >
bq. > This has to go here? Can it go into one of the replication classes?
Used by master and region server, to me it belongs there.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/master/ServerManager.java, line 156
bq. > <http://review.hbase.org/r/76/diff/5/?file=1109#file1109line156>
bq. >
bq. > Can't you just do c.get("key", defaultvalue)?
No, I also do a check on replication.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java,
line 929
bq. > <http://review.hbase.org/r/76/diff/5/?file=1110#file1110line929>
bq. >
bq. > You writing startcode into zk? Why not write servername -- the
host+port+startcode combo?
To be coherent with the rest of the code that uses zookeeper.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java,
line 1075
bq. > <http://review.hbase.org/r/76/diff/5/?file=1110#file1110line1075>
bq. >
bq. > Is this directory name? Confusingly named given
rootdir+regLogPathStr only adds up to repLogPath.
I don't understand you, but this code is going to be removed in my next patch
as I'm simplifying RepSink.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 55
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line55>
bq. >
bq. > Peers are named '1', '2'? Can't we have more meaningful names here?
We agreed that peers are identified with a short internally as it is stored. We
could use an external mapping of short->cute_name.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 59
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line59>
bq. >
bq. > Use servername instead of startcode
Same comment as before, needs to be coherent.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 60
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line60>
bq. >
bq. > All RS's in a master cluster replicate?
Yep... was that an implicit way of saying that I need to document that in RZH?
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 107
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line107>
bq. >
bq. > Should this class be called WRapper instaad of Helper?
Sure
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 185
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line185>
bq. >
bq. > You mean 'ensemble' here rather than 'quorum' (Patrick will kill you
if he sees you calling it a 'quorum' when you mean the other)
Argh I'm trying to correct myself but I'm still missing some of them. Thx!
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 263
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line263>
bq. >
bq. > We keep up the replication position in zk? How much do we replicate
in one go? Its not a single edit, is it? We do this for every log file?
Yes. A defined amount specified in ReplicationSource. No. Every current log
file, we only replicate one at a time per region server.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 328
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line328>
bq. >
bq. > LOG.warn instead?
bq. >
I'll do like the rest and log.error
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 354
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line354>
bq. >
bq. > We return empty map if clusters size is == 1? Should that be
clusters.size == 0?
That part isn't clear enough, so the reason it's 1 and not 0 is that we put a
lock in there so it's listed in the znodes we fetch. Actually this should be <=
1 rather than ==.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 356
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line356>
bq. >
bq. > Whats this about?
See previous comment, we lock the dead region server's znode by putting a lock
in there, but we don't want to process the hlogs under since... it's not a
cluster. Could use more doc.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java,
line 402
bq. > <http://review.hbase.org/r/76/diff/5/?file=1113#file1113line402>
bq. >
bq. > Just logging errors? What if session expired (our discussion from
last day)?
Yes I need to review how I handle it in RZH, but I'd also need to review ZKW
since some methods will hid it in there.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/replication/package.html, line 41
bq. > <http://review.hbase.org/r/76/diff/5/?file=1115#file1115line41>
bq. >
bq. > Call it alpha
yeah! (j/k)
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/replication/package.html, line 64
bq. > <http://review.hbase.org/r/76/diff/5/?file=1115#file1115line64>
bq. >
bq. > Whats this about? You need to run zk yourself but no zoo.cfg?
I... don't remember why I wrote this.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/replication/package.html, line 73
bq. > <http://review.hbase.org/r/76/diff/5/?file=1115#file1115line73>
bq. >
bq. > And if not? What if replicating single-family only?
Forgot to update that after we added scoping, updating.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/replication/package.html, line 83
bq. > <http://review.hbase.org/r/76/diff/5/?file=1115#file1115line83>
bq. >
bq. > Has to be offline? Will this always be the case?
Currently everything is static, but I hope we can move on from that in the
future.
bq. On 2010-06-11 12:45:29, stack wrote:
bq. > src/main/java/org/apache/hadoop/hbase/replication/package.html, line 108
bq. > <http://review.hbase.org/r/76/diff/5/?file=1115#file1115line108>
bq. >
bq. > whats ratio?
This is a log snippet that's coming from a region server. Do you want to see
more documentation about it in package.html or in the logging itself?
- Jean-Daniel
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/76/#review191
-----------------------------------------------------------
> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>
> Key: HBASE-2223
> URL: https://issues.apache.org/jira/browse/HBASE-2223
> Project: HBase
> Issue Type: Sub-task
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Fix For: 0.21.0
>
> Attachments: HBASE-2223.patch
>
>
> We need a nice way of handling long network partitions without impacting a
> master cluster (which pushes the data). Currently it will just retry over and
> over again.
> I think we could:
> - Stop replication to a slave cluster if it didn't respond for more than 10
> minutes
> - Keep track of the duration of the partition
> - When the slave cluster comes back, initiate a MR job like HBASE-2221
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or
> just the first 2 parts. Discuss.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.