[
https://issues.apache.org/jira/browse/HBASE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated HBASE-4045:
--------------------------------------
Attachment: HBASE-4045.patch
Easy fix, instead of returning null in fetchSlavesAddresses I'll return an
empty list.
> [replication] NPE in ReplicationSource when ZK is gone
> ------------------------------------------------------
>
> Key: HBASE-4045
> URL: https://issues.apache.org/jira/browse/HBASE-4045
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.3
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Priority: Minor
> Fix For: 0.90.4
>
> Attachments: HBASE-4045.patch
>
>
> We got this in production, it killed the replication thread but the server
> itself was fine and the master kept the logs:
> {quote}
> 2011-06-26 16:02:56,092 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 26667ms for sessionid
> 0x22f9dcb30ab01b8, closing socket connection and attempting reconnect
> 2011-06-26 16:02:56,213 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster:
> 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received ZooKeeper Event, type=None,
> state=Disconnected, path=null
> 2011-06-26 16:02:56,213 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster:
> 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received Disconnected from ZooKeeper,
> ignoring
> 2011-06-26 16:02:56,213 WARN
> org.apache.hadoop.hbase.replication.ReplicationZookeeper: Cannot get peer's
> region server addresses
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /hbase/rs
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
> at
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:228)
> at
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:216)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)2011-06-26
> 16:02:56,222 ERROR
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing
> source 5 because an error occurred: Uncaught exception during runtime
> java.lang.Exception: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628)
> at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)Caused
> by: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:208)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira