[ https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682571#comment-16682571 ]
Andrew Purtell commented on HBASE-20604: ---------------------------------------- I have no strong opinion, could go either way. Certainly better late than never for more review. However it’s not great for a contributor to have us complete a review, and it was completed, see above, and then have a commit after due testing, which we also have, only to see the issue reopened. I argue this is suboptimal process. A better alternative is a new Jira and the reviewer who volunteered more suggestions can be given the option to perform the work on the new jira, or maybe Esteban would be interested. For the sake of predictability in our process. > ProtobufLogReader#readNext can incorrectly loop to the same position in the > stream until the the WAL is rolled > -------------------------------------------------------------------------------------------------------------- > > Key: HBASE-20604 > URL: https://issues.apache.org/jira/browse/HBASE-20604 > Project: HBase > Issue Type: Bug > Components: Replication, wal > Affects Versions: 3.0.0 > Reporter: Esteban Gutierrez > Assignee: Esteban Gutierrez > Priority: Critical > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9 > > Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, > HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch > > > Every time we call {{ProtobufLogReader#readNext}} we consume the input stream > associated to the {{FSDataInputStream}} from the WAL that we are reading. > Under certain conditions, e.g. when using the encryption at rest > ({{CryptoInputStream}}) the stream can return partial data which can cause a > premature EOF that cause {{inputStream.getPos()}} to return to the same > origina position causing {{ProtobufLogReader#readNext}} to re-try over the > reads until the WAL is rolled. > The side effect of this issue is that {{ReplicationSource}} can get stuck > until the WAL is rolled and causing replication delays up to an hour in some > cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)