----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/76/#review281 -----------------------------------------------------------
Missing is more desciption of how this feature works. There is package doc. on how to get it going but needs fuller description of how replication works (how zk is used, how it manages hlogs, how it sends edits to remote cluster, etc.). There is insufficient description in javadoc. Without this, there is a danger that only the author will be able to figure whats going on in here. How does one get insight on to how well or badly replication is working? Alot of errors are logged and then we just move on. Thats fine for an alpha package but need a plan for making it all more robust? What you thinking? bin/replication/add_peer.rb <http://review.hbase.org/r/76/#comment1187> These log messages add nothing? Remove? Just restating what was passed on cmdline. bin/replication/add_peer.rb <http://review.hbase.org/r/76/#comment1188> Why do this? Its annoying? It'd be better spending time interrogating what the user passed to see if it makes sense throwing errors if incorrectly formatted or if it doesn't look like it makes sense? bin/replication/add_peer.rb <http://review.hbase.org/r/76/#comment1189> A message on end saying what it did can be comforting to users. bin/replication/copy_tables_desc.rb <http://review.hbase.org/r/76/#comment1190> See above comments. src/main/java/org/apache/hadoop/hbase/HConstants.java <http://review.hbase.org/r/76/#comment1191> Change name of this config to hbase.replication. src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java <http://review.hbase.org/r/76/#comment1192> Does this method have prerequisites? For example, does replication have to be enabled? src/main/java/org/apache/hadoop/hbase/master/HMaster.java <http://review.hbase.org/r/76/#comment1193> Is this a leak from another patch? src/main/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java <http://review.hbase.org/r/76/#comment1194> This is a leak from another patch? src/main/java/org/apache/hadoop/hbase/master/ServerManager.java <http://review.hbase.org/r/76/#comment1195> Whats going on here? We're setting into config the logcleaner to use but then in the lines after this we go ahead and create OldLogsCleaner anyways? src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <http://review.hbase.org/r/76/#comment1196> Just call this 'replication' or 'replicationEnabled' src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <http://review.hbase.org/r/76/#comment1198> We are a ReplicationSink or a ReplicationMaster? One or the other? Can we not move all of this Replication stuff into a Replication class including the setup of whether we are a Sink or Master rather than have it hang out here in HRS? src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <http://review.hbase.org/r/76/#comment1197> I don't grok the comment src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <http://review.hbase.org/r/76/#comment1199> This could be null if we are a replication master? src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java <http://review.hbase.org/r/76/#comment1200> If the above mentioned Replication class were a singleton, it could be shared here with HRS src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java <http://review.hbase.org/r/76/#comment1201> It'd be good if we didn't have to create this byte [] per edit. I should reinstitute the comparator I had that took KVs but only compared family portion of the KV... no need to create family byte [] then. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java <http://review.hbase.org/r/76/#comment1202> YOu might say why its being moved. Should be INFO level? WALs are kinda important to track? src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java <http://review.hbase.org/r/76/#comment1203> isReplicating is what you would name the method that accesses the boolean replicating... with this name the accessor should be named isIsReplicating. src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java <http://review.hbase.org/r/76/#comment1204> Why do it this way? Why an if/else? Why not do if (...length != 3) throw .... no need of an else. src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java <http://review.hbase.org/r/76/#comment1205> This is an error? Is it critical fail (We dropped recording a WAL) Are we just logging and moving on? src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java <http://review.hbase.org/r/76/#comment1206> Same here.... We just log and keep going? Are any of these fatal? src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java <http://review.hbase.org/r/76/#comment1207> If this fails, what are the repercusssions? src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java <http://review.hbase.org/r/76/#comment1208> What happens if this server dies? It gets cleaned up by master? What happens to replication if master dies? Does master failover need to do anything special to pick up running replication? src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java <http://review.hbase.org/r/76/#comment1209> spelling src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java <http://review.hbase.org/r/76/#comment1213> Why this implementation have to know about other implementations? Can't we do a chain of decision classes? Any class can say no? As soon as any decision class says no, we exit the chain.... So in this case, first on the chain would be the ttl decision... then would be this one... and third would be the snapshotting decision. You don't have to do the chain as part of this patch but please open an issue to implement. src/main/java/org/apache/hadoop/hbase/replication/package.html <http://review.hbase.org/r/76/#comment1210> Its not just MDC, right? src/main/java/org/apache/hadoop/hbase/replication/package.html <http://review.hbase.org/r/76/#comment1211> Is it user tables or CF in user tables? src/main/java/org/apache/hadoop/hbase/replication/package.html <http://review.hbase.org/r/76/#comment1212> Good doc. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1214> What is the 'main thread'? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1215> Can you make this clearer? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1216> Whats this mean? Lost edits? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1217> Maybe say something about how it works... pool of HTables to write remote peer, etc. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1218> Why again does replication keep its own set of logs rather than just keep positions on HRS logs (whether local HRS or the HRS of others?) So it prevails across HRS failures? Aren't logs archived now rather than thrown away so this should be fine? Otherwise, we are writing edits to FS twice, right? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1220> This should be inside a finally block? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1221> Duplicated code -- see up about ten lines. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java <http://review.hbase.org/r/76/#comment1222> You are stopping the regionserver because replication failed? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1223> One second seems short? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1226> We carry edits in memory? And its outside of our normal accounting -- e.g. cache and max for memstores? How we get it into the mix? Else we could OOME if we don't account for this memory payload. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1224> Ratio? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1227> Not a constructor src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1228> Stop cluster or stop the host? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1230> Whats this about? IN particular, the 'handling story'... src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1231> When would it make sense to ever replicate catalog table entries? Why we even in here reading a file if replicating == false src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1232> Are you checking size of these edits? You'll read 25k entries of any size? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java <http://review.hbase.org/r/76/#comment1233> Where do you explain replication hlog filenaming convention? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java <http://review.hbase.org/r/76/#comment1234> Can only do one sink, right? Might want to say so. How does the source know the sink to replicate to? Should that be part of the Interface? The comment for the Interface could say how the Interface is used... init, then set sink, then per file archived, call ... etc. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java <http://review.hbase.org/r/76/#comment1235> Make this "This class manages all the replication sources" "There are two classes of them"... what you mean to say here? Two classes of source? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java <http://review.hbase.org/r/76/#comment1236> I don't follow this commentary. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java <http://review.hbase.org/r/76/#comment1238> You should say more here about whats going on.. either here or up in the class comment. It has watchers on all other RSs in current cluster and will try to take on the workloads of others on crash (?). src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java <http://review.hbase.org/r/76/#comment1239> Comment is cut off. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java <http://review.hbase.org/r/76/#comment1240> You reporting status or updating updated position into zk? There is a running status being kept up in zk? Shouldn't this log message have the log path in it too? This method does more than update position or report status... looks like it changes state in this class updating set of logs? src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java <http://review.hbase.org/r/76/#comment1241> Shouldn't this be nonmodifiable List that you give out here since it seems you have it synchronized internally here. src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java <http://review.hbase.org/r/76/#comment1243> What other regionserver? There is alot that goes unexplained in here? Could point to the doc. on how replication works. Same for the transer method above (I have an idea because you white boarded it for me). src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java <http://review.hbase.org/r/76/#comment1244> Good. - stack On 2010-06-23 17:24:52, Jean-Daniel Cryans wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.hbase.org/r/76/ > ----------------------------------------------------------- > > (Updated 2010-06-23 17:24:52) > > > Review request for hbase. > > > Summary > ------- > > This is HBASE-2223 AKA Replication 2.0, it is currently only a "preview > patch" as it's pretty much feature complete, works on a cluster, has unit > tests and whatnot, but it could use a lot more testing and cleaning and ideas > from others. > > > This addresses bug HBASE-2223. > http://issues.apache.org/jira/browse/HBASE-2223 > > > Diffs > ----- > > bin/replication/add_peer.rb PRE-CREATION > bin/replication/copy_tables_desc.rb PRE-CREATION > src/main/java/org/apache/hadoop/hbase/HConstants.java 4357ee5 > src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 62617ac > src/main/java/org/apache/hadoop/hbase/master/HMaster.java 5367638 > src/main/java/org/apache/hadoop/hbase/master/LogCleanerDelegate.java > 4c5153e > > src/main/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java > 10f9dbd > src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 9fb1cce > src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > 33b9a99 > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 94f18c6 > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 5d4cffe > > src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java > PRE-CREATION > > src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java > PRE-CREATION > src/main/java/org/apache/hadoop/hbase/replication/package.html PRE-CREATION > > src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java > PRE-CREATION > > src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java > PRE-CREATION > > src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java > PRE-CREATION > > src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java > PRE-CREATION > src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java > b26c071 > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 0e69709 > > src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java > PRE-CREATION > src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java > PRE-CREATION > > src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java > PRE-CREATION > > src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java > PRE-CREATION > > src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java > PRE-CREATION > > Diff: http://review.hbase.org/r/76/diff > > > Testing > ------- > > > Thanks, > > Jean-Daniel > >