[
https://issues.apache.org/jira/browse/HBASE-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jieshan Bean updated HBASE-8253:
--------------------------------
Attachment: HBASE-8253-94.patch
Patch for discussion.
In ReplicationSource#readAllEntriesToReplicateOrNextFile, only read for the
first edit may throw EOF. So when we get EOF, currentNbEntries should be 0. No
other case.
Please correct me if I am wrong.
> A corrupted log blocked ReplicationSource
> -----------------------------------------
>
> Key: HBASE-8253
> URL: https://issues.apache.org/jira/browse/HBASE-8253
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 0.94.6
> Reporter: Jieshan Bean
> Assignee: Jieshan Bean
> Attachments: HBASE-8253-94.patch
>
>
> A writting log got corrupted when we forcely power down one node. Only
> partial of last WALEdit was written into that log. And that log was not the
> last one in replication queue.
> ReplicationSource was blocked under this scenario. A lot of logs like below
> were printed:
> {noformat}
> 2013-03-30 06:53:48,628 WARN
> [regionserver26003-EventThread.replicationSource,1] 1 Got:
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
> java.io.EOFException:
> hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
> entryStart=40434738, pos=40450048, end=40450048, edit=0
> at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown
> Source)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
> at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
> ... 3 more
> ..........
> 2013-03-30 06:54:38,899 WARN
> [regionserver26003-EventThread.replicationSource,1] 1 Got:
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
> java.io.EOFException:
> hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
> entryStart=40434738, pos=40450048, end=40450048, edit=0
> at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown
> Source)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
> at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
> ... 3 more
> ...........
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira