[ 
https://issues.apache.org/jira/browse/HBASE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732864#comment-13732864
 ] 

Jean-Daniel Cryans commented on HBASE-8615:
-------------------------------------------

You quoted an earlier comment where I hadn't done as much investigation as the 
one where I start with "Here's what I know about the different problems" where 
I aimed at dumping my whole understanding of the problem.

bq. Are you implying I should supply a patch? I can do that but probably not 
this week unfortunately.

Nope.

bq. Or do you mean my hunch is invalid. Just checking

It could very well be related and it could explain why we're not seeing this 
problem with an uncompressed log.
                
> HLog Compression fails in mysterious ways (working title)
> ---------------------------------------------------------
>
>                 Key: HBASE-8615
>                 URL: https://issues.apache.org/jira/browse/HBASE-8615
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Ted Yu
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: 172.21.3.117%2C60020%2C1375222888304.1375222894855.zip, 
> HBASE-8615-test.patch, 
> org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
>
>
> In a recent test run, I noticed the following in test output:
> {code}
> 2013-05-24 22:01:02,424 DEBUG 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  fs.HFileSystem$ReorderWALBlocks(327): 
> /user/hortonzy/hbase/.logs/kiyo.gq1.ygridcore.net,42690,1369432806911/kiyo.gq1.ygridcore.net%2C42690%2C1369432806911.1369432840428
>  is an HLog file, so reordering blocks, last hostname will 
> be:kiyo.gq1.ygridcore.net
> 2013-05-24 22:01:02,429 DEBUG 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  wal.ProtobufLogReader(118): After reading the trailer: walEditsStopOffset: 
> 132235, fileLength: 132243, trailerPresent: true
> 2013-05-24 22:01:02,438 ERROR 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  wal.ProtobufLogReader(236): Error  while reading 691 WAL KVs; started 
> reading at 53272 and read up to 65538
> 2013-05-24 22:01:02,438 WARN  
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  regionserver.ReplicationSource(324): 2 Got:
> java.io.IOException: Error  while reading 691 WAL KVs; started reading at 
> 53272 and read up to 65538
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:237)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:96)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:404)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:320)
> Caused by: java.lang.IndexOutOfBoundsException: index (30062) must be less 
> than size (1)
>         at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
>         at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:124)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:71)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:42)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.readIntoArray(WALCellCodec.java:210)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.parseCell(WALCellCodec.java:184)
>         at 
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:213)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:217)
>         ... 4 more
> 2013-05-24 22:01:02,439 DEBUG 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  regionserver.ReplicationSource(583): Nothing to replicate, sleeping 100 
> times 10
> {code}
> Will attach test output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to