[ https://issues.apache.org/jira/browse/HBASE-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dave Latham updated HBASE-8096: ------------------------------- Description: We're getting an NPE during replication, which causes replication for that RegionServer to stop until we restart it. {noformat} 2013-03-10 12:49:12,679 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1882) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1855) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1831) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1482) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:505) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:313) {noformat} Some extra digging into the DataNode and NameNode logs makes this seem related to HBASE-7530 and HDFS-4380 Here's the relevant snipped portions of the RS, DN, and NN logs: {noformat} RS 2013-03-10 12:49:12,618 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #hslave1177%2C60020%2C1362549511446.1362944946489 for position 59670826 in hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 RS 2013-03-10 12:49:12,621 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested RS 2013-03-10 12:49:12,623 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicated in total: 31500300 RS 2013-03-10 12:49:12,623 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication hslave1177%2C60020%2C1362549511446.1362944946489 at 59670826 NN 2013-03-10 12:49:12,627 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489. blk_6905758215335505153_656717631 RS 2013-03-10 12:49:12,679 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 DN 2013-03-10 12:49:12,680 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6905758215335505153_656717631 src: /192.168.44.1:43503 dest: /192.168.44.1:50010 NN 2013-03-10 12:49:12,804 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.fsync: file /hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 for DFSClient_hb_rs_hslave1177,60020,1362549511446 {noformat} was: We're getting an NPE during replication, which causes replication for that RegionServer to stop until we restart it. {noformat} 2013-03-10 12:49:12,679 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1882) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1855) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1831) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1482) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:505) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:313) {noformat} Some extra digging into the DataNode and NameNode logs makes this seem related to: https://issues.apache.org/jira/browse/HBASE-7530 and https://issues.apache.org/jira/browse/HDFS-4380 Here's the relevant snipped portions of the RS, DN, and NN logs: {noformat} RS 2013-03-10 12:49:12,618 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #hslave1177%2C60020%2C1362549511446.1362944946489 for position 59670826 in hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 RS 2013-03-10 12:49:12,621 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested RS 2013-03-10 12:49:12,623 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicated in total: 31500300 RS 2013-03-10 12:49:12,623 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication hslave1177%2C60020%2C1362549511446.1362944946489 at 59670826 NN 2013-03-10 12:49:12,627 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489. blk_6905758215335505153_656717631 RS 2013-03-10 12:49:12,679 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 DN 2013-03-10 12:49:12,680 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6905758215335505153_656717631 src: /192.168.44.1:43503 dest: /192.168.44.1:50010 NN 2013-03-10 12:49:12,804 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.fsync: file /hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 for DFSClient_hb_rs_hslave1177,60020,1362549511446 {noformat} > [replication] NPE while replicating a log that is acquiring a new block from > HDFS > ---------------------------------------------------------------------------------- > > Key: HBASE-8096 > URL: https://issues.apache.org/jira/browse/HBASE-8096 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.5 > Reporter: Ian Friedman > > We're getting an NPE during replication, which causes replication for that > RegionServer to stop until we restart it. > {noformat} > 2013-03-10 12:49:12,679 ERROR > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > Unexpected exception in ReplicationSource, > currentPath=hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1882) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1855) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1831) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) > at > org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108) > at > org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1482) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:505) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:313) > {noformat} > Some extra digging into the DataNode and NameNode logs makes this seem > related to HBASE-7530 and HDFS-4380 > Here's the relevant snipped portions of the RS, DN, and NN logs: > {noformat} > RS 2013-03-10 12:49:12,618 INFO > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: > Going to report log #hslave1177%2C60020%2C1362549511446.1362944946489 for > position 59670826 in > hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 > RS 2013-03-10 12:49:12,621 DEBUG > org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested > RS 2013-03-10 12:49:12,623 DEBUG > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > Replicated in total: 31500300 > RS 2013-03-10 12:49:12,623 DEBUG > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening > log for replication hslave1177%2C60020%2C1362549511446.1362944946489 at > 59670826 > NN 2013-03-10 12:49:12,627 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489. > blk_6905758215335505153_656717631 > RS 2013-03-10 12:49:12,679 ERROR > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > Unexpected exception in ReplicationSource, > currentPath=hdfs://hmaster1:9000/hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 > DN 2013-03-10 12:49:12,680 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_6905758215335505153_656717631 src: /192.168.44.1:43503 dest: > /192.168.44.1:50010 > NN 2013-03-10 12:49:12,804 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.fsync: file > /hbase/.logs/hslave1177,60020,1362549511446/hslave1177%2C60020%2C1362549511446.1362944946489 > for DFSClient_hb_rs_hslave1177,60020,1362549511446 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira