[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682677#comment-16682677
 ] 

Duo Zhang commented on HBASE-20604:
-----------------------------------

I'm OK with opening a new issue to address the remaining problems, and I can 
take the charge but the problem is that, no one told me what is the real 
problem... And no failing UT is not a strong reason as I believe that the code 
we added here will not be executed in our existing UTs...

The description is not very clear on what is going on, I would like to see more 
detailed explanation, better to point out the problematic code in 
CryptoInputStream. Is it the one in hadoop or in apache commons? Is there an 
existing jira abort it?

Thanks.

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20604
>                 URL: https://issues.apache.org/jira/browse/HBASE-20604
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, wal
>    Affects Versions: 3.0.0
>            Reporter: Esteban Gutierrez
>            Assignee: Esteban Gutierrez
>            Priority: Critical
>             Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
>         Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to