[
https://issues.apache.org/jira/browse/HBASE-13724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552817#comment-14552817
]
stack commented on HBASE-13724:
-------------------------------
Running with asserts in production is not usual practice so you will probably
find lots of 'interesting' issues.
Regards this current one, our assert should print out something better than
just that it tripped. I wonder what realLength is coming back as in this case.
Looks like we'll go back and start reading earlier in the file so double
replication -- probably not the end of the world but to be fixed for sure.
bq. Should we harden replication source to deal with these types of assertion
errors ...
Yes. Convert to an exception... and sounds like a retry might be in order here
as you suggest.
> ReplicationSource dies under certain conditions reading a sequence file
> -----------------------------------------------------------------------
>
> Key: HBASE-13724
> URL: https://issues.apache.org/jira/browse/HBASE-13724
> Project: HBase
> Issue Type: Bug
> Reporter: churro morales
>
> A little background,
> We run our server in -ea mode and have seen quite a few replication sources
> silently die over the past few months.
> Note: the stacktrace I posted below comes from a regionserver running 0.94
> but quickly looking at this issue, I believe this will happen in 98 too.
> Should we harden replication source to deal with these types of assertion
> errors by catching throwables, should we be dealing with this at the sequence
> file reader level? Still looking into the root cause of this issue but when
> manually shutdown our regionservers the regionserver that recovered its queue
> replicated that log just fine. So in our case a simple retry would've worked
> just fine.
> {code}
> 2015-05-08 11:04:23,348 ERROR
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Unexpected exception in ReplicationSource,
> currentPath=hdfs://hm6.xxx.flurry.com:9000/hbase/.logs/xxxxx.yy.flurry.com,60020,1426792702998/xxxxx.atl.flurry.com%2C60020%2C1426792702998.1431107922449
> java.lang.AssertionError
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader$WALReaderFSDataInputStream.getPos(SequenceFileLogReader.java:121)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1489)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:178)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:734)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:583)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:373)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)