[
https://issues.apache.org/jira/browse/HBASE-15983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319397#comment-15319397
]
Sean Busbey commented on HBASE-15983:
-------------------------------------
for the curious, I didn't mark this as a blocker and I think it can be moved
out of in-progress releases because at least the "every possible error means
silently treat as end of file" has been present essentially since we've had
replication and I don't know yet what versions are impacted by the offset error
(and I don't know how long finding that cause will take).
In the test runs I was able to perform, replaying the now-closed-WAL once we
detect there's an error while there are bytes left a single time was sufficient
to remove the problem entirely, so I think having that done will suffice for
current production deployments.
> Replication improperly discards data from end-of-wal in some cases.
> -------------------------------------------------------------------
>
> Key: HBASE-15983
> URL: https://issues.apache.org/jira/browse/HBASE-15983
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 0.98.0, 1.0.0, 1.1.0, 1.2.0
> Reporter: Sean Busbey
> Assignee: Sean Busbey
> Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.0.4, 1.4.0, 1.2.2, 0.98.20, 1.1.6
>
>
> In some particular deployments, the Replication code believes it has
> reached EOF for a WAL prior to successfully parsing all bytes known to
> exist in a cleanly closed file.
> The underlying issue is that several different underlying problems with a WAL
> reader are all treated as end-of-file by the code in ReplicationSource that
> decides if a given WAL is completed or needs to be retried.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)