[ 
https://issues.apache.org/jira/browse/HBASE-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319397#comment-15319397
 ] 

Sean Busbey commented on HBASE-15983:
-------------------------------------

for the curious, I didn't mark this as a blocker and I think it can be moved 
out of in-progress releases because at least the "every possible error means 
silently treat as end of file" has been present essentially since we've had 
replication and I don't know yet what versions are impacted by the offset error 
(and I don't know how long finding that cause will take).

In the test runs I was able to perform, replaying the now-closed-WAL once we 
detect there's an error while there are bytes left a single time was sufficient 
to remove the problem entirely, so I think having that done will suffice for 
current production deployments.

> Replication improperly discards data from end-of-wal in some cases.
> -------------------------------------------------------------------
>
>                 Key: HBASE-15983
>                 URL: https://issues.apache.org/jira/browse/HBASE-15983
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.98.0, 1.0.0, 1.1.0, 1.2.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.0.4, 1.4.0, 1.2.2, 0.98.20, 1.1.6
>
>
> In some particular deployments, the Replication code believes it has
> reached EOF for a WAL prior to successfully parsing all bytes known to
> exist in a cleanly closed file.
> The underlying issue is that several different underlying problems with a WAL 
> reader are all treated as end-of-file by the code in ReplicationSource that 
> decides if a given WAL is completed or needs to be retried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to