-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/76/#review281
-----------------------------------------------------------


Missing is more desciption of how this feature works.  There is package doc. on 
how to get it going but needs fuller description of how replication works (how 
zk is used, how it manages hlogs, how it sends edits to remote cluster, etc.).  
There is insufficient description in javadoc.  Without this, there is a danger 
that only the author will be able to figure whats going on in here.

How does one get insight on to how well or badly replication is working?

Alot of errors are logged and then we just move on.  Thats fine for an alpha 
package but need a plan for making it all more robust?  What you thinking?


bin/replication/add_peer.rb
<http://review.hbase.org/r/76/#comment1187>

    These log messages add nothing?  Remove?  Just restating what was passed on 
cmdline.



bin/replication/add_peer.rb
<http://review.hbase.org/r/76/#comment1188>

    Why do this?  Its annoying?
    
    It'd be better spending time interrogating what the user passed to see if 
it makes sense throwing errors if incorrectly formatted or if it doesn't look 
like it makes sense?



bin/replication/add_peer.rb
<http://review.hbase.org/r/76/#comment1189>

    A message on end saying what it did can be comforting to users.



bin/replication/copy_tables_desc.rb
<http://review.hbase.org/r/76/#comment1190>

    See above comments.



src/main/java/org/apache/hadoop/hbase/HConstants.java
<http://review.hbase.org/r/76/#comment1191>

    Change name of this config to hbase.replication.



src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
<http://review.hbase.org/r/76/#comment1192>

    Does this method have prerequisites? For example, does replication have to 
be enabled?



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.hbase.org/r/76/#comment1193>

    Is this a leak from another patch?



src/main/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java
<http://review.hbase.org/r/76/#comment1194>

    This is a leak from another patch?



src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
<http://review.hbase.org/r/76/#comment1195>

    Whats going on here?  We're setting into config the logcleaner to use but 
then in the lines after this we go ahead and create OldLogsCleaner anyways?



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1196>

    Just call this 'replication' or 'replicationEnabled'



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1198>

    We are a ReplicationSink or a ReplicationMaster?  One or the other?  Can we 
not move all of this Replication stuff into a Replication class including the 
setup of whether we are a Sink or Master rather than have it hang out here in 
HRS?  



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1197>

    I don't grok the comment



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1199>

    This could be null if we are a replication master?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/76/#comment1200>

    If the above mentioned Replication class were a singleton, it could be 
shared here with HRS



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/76/#comment1201>

    It'd be good if we didn't have to create this byte [] per edit.  I should 
reinstitute the comparator I had that took KVs but only compared family portion 
of the KV... no need to create family byte [] then.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/76/#comment1202>

    YOu might say why its being moved.  Should be INFO level?  WALs are kinda 
important to track?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1203>

    isReplicating is what you would name the method that accesses the boolean 
replicating... with this name the accessor should be named isIsReplicating.



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1204>

    Why do it this way?  Why an if/else?  Why not do if (...length != 3) throw 
.... no need of an else.



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1205>

    This is an error? Is it critical fail (We dropped recording a WAL)  Are we 
just logging and moving on?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1206>

    Same here.... We just log and keep going?  Are any of these fatal?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1207>

    If this fails, what are the repercusssions?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1208>

    What happens if this server dies?  It gets cleaned up by master?
    
    What happens to replication if master dies?  Does master failover need to 
do anything special to pick up running replication?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1209>

    spelling
    



src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
<http://review.hbase.org/r/76/#comment1213>

    Why this implementation have to know about other implementations?  Can't we 
do a chain of decision classes? Any class can say no?  As soon as any decision 
class says no, we exit the chain.... So in this case, first on the chain would 
be the ttl decision... then would be this one... and third would be the 
snapshotting decision. You don't have to do the chain as part of this patch but 
please open an issue to implement.



src/main/java/org/apache/hadoop/hbase/replication/package.html
<http://review.hbase.org/r/76/#comment1210>

    Its not just MDC, right?



src/main/java/org/apache/hadoop/hbase/replication/package.html
<http://review.hbase.org/r/76/#comment1211>

    Is it user tables or CF in user tables?



src/main/java/org/apache/hadoop/hbase/replication/package.html
<http://review.hbase.org/r/76/#comment1212>

    Good doc.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1214>

    What is the 'main thread'?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1215>

    Can you make this clearer?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1216>

    Whats this mean?  Lost edits?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1217>

    Maybe say something about how it works... pool of HTables to write remote 
peer, etc.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1218>

    Why again does replication keep its own set of logs rather than just keep 
positions on HRS logs (whether local HRS or the HRS of others?)  So it prevails 
across HRS failures?  Aren't logs archived now rather than thrown away so this 
should be fine?  Otherwise, we are writing edits to FS twice, right?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1220>

    This should be inside a finally block?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1221>

    Duplicated code -- see up about ten lines.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1222>

    You are stopping the regionserver because replication failed?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1223>

    One second seems short?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1226>

    We carry edits in memory?  And its outside of our normal accounting -- e.g. 
cache and max for memstores?  How we get it into the mix?  Else we could OOME 
if we don't account for this memory payload.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1224>

    Ratio?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1227>

    Not a constructor



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1228>

    Stop cluster or stop the host?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1230>

    Whats this about?  IN particular, the 'handling story'...



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1231>

    When would it make sense to ever replicate catalog table entries?
    
    Why we even in here reading a file if replicating == false



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1232>

    Are you checking size of these edits?  You'll read 25k entries of any size?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1233>

    Where do you explain replication hlog filenaming convention?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
<http://review.hbase.org/r/76/#comment1234>

    Can only do one sink, right?  Might want to say so.  How does the source 
know the sink to replicate to?  Should that be part of the Interface?
    
    The comment for the Interface could say how the Interface is used... init, 
then set sink, then per file archived, call ... etc.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1235>

    Make this "This class manages all the replication sources"
    
    "There are two classes of them"... what you mean to say here?  Two classes 
of source?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1236>

    I don't follow this commentary.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1238>

    You should say more here about whats going on.. either here or up in the 
class comment.  It has watchers on all other RSs in current cluster and will 
try to take on the workloads of others on crash (?).



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1239>

    Comment is cut off.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1240>

    You reporting status or updating updated position into zk?  There is a 
running status being kept up in zk?  Shouldn't this log message have the log 
path in it too?
    
    This method does more than update position or report status... looks like 
it changes state in this class updating set of logs?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1241>

    Shouldn't this be nonmodifiable List that you give out here since it seems 
you have it synchronized internally here.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1243>

    What other regionserver?  There is alot that goes unexplained in here?  
Could point to the doc. on how replication works.  Same for the transer method 
above (I have an idea because you white boarded it for me).



src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
<http://review.hbase.org/r/76/#comment1244>

    Good.


- stack


On 2010-06-23 17:24:52, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/76/
> -----------------------------------------------------------
> 
> (Updated 2010-06-23 17:24:52)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This is HBASE-2223 AKA Replication 2.0, it is currently only a "preview 
> patch" as it's pretty much feature complete, works on a cluster, has unit 
> tests and whatnot, but it could use a lot more testing and cleaning and ideas 
> from others.
> 
> 
> This addresses bug HBASE-2223.
>     http://issues.apache.org/jira/browse/HBASE-2223
> 
> 
> Diffs
> -----
> 
>   bin/replication/add_peer.rb PRE-CREATION 
>   bin/replication/copy_tables_desc.rb PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/HConstants.java 4357ee5 
>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 62617ac 
>   src/main/java/org/apache/hadoop/hbase/master/HMaster.java 5367638 
>   src/main/java/org/apache/hadoop/hbase/master/LogCleanerDelegate.java 
> 4c5153e 
>   
> src/main/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java 
> 10f9dbd 
>   src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 9fb1cce 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
> 33b9a99 
>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 94f18c6 
>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 5d4cffe 
>   
> src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
>  PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/replication/package.html PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
>  PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 
> b26c071 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 0e69709 
>   
> src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java 
> PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java
>  PRE-CREATION 
>   
> src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
>  PRE-CREATION 
> 
> Diff: http://review.hbase.org/r/76/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>

Reply via email to