[ https://issues.apache.org/jira/browse/HBASE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728465#comment-13728465 ]
Gabriel Reid commented on HBASE-8561: ------------------------------------- Wouldn't it be easier to just throw an exception up the stack and then (I assume) abort the regionserver? That way it'll be immediately clear if there is an issue, as well as removing the need to do null checking everywhere in the replication code. > [replication] Don't instantiate a ReplicationSource if the passed > implementation isn't found > -------------------------------------------------------------------------------------------- > > Key: HBASE-8561 > URL: https://issues.apache.org/jira/browse/HBASE-8561 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.94.6.1 > Reporter: Jean-Daniel Cryans > Fix For: 0.98.0, 0.95.2 > > > I was debugging a case where the region servers were dying with: > {noformat} > ABORTING region server someserver.com,60020,1368123702806: Writing > replication status > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/someserver.com,60020,1368123702806/etcetcetc/somserver.com%2C60020%2C1368123702740.1368123705091 > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:354) > > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:846) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:898) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:892) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558) > > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) > > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:638) > > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:387) > {noformat} > Turns out the problem really was: > {noformat} > 2013-05-09 11:21:45,625 WARN > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: > Passed replication source implementation throws errors, defaulting to > ReplicationSource > java.lang.ClassNotFoundException: Some.Other.ReplicationSource.Implementation > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:423) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:186) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:324) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:202) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.init(ReplicationSourceManager.java:174) > at > org.apache.hadoop.hbase.replication.regionserver.Replication.startReplicationService(Replication.java:171) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.startServiceThreads(HRegionServer.java:1583) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1042) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:698) > at java.lang.Thread.run(Thread.java:722) > {noformat} > So I think instantiating a ReplicationSource here is wrong and makes it > harder to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira