[
https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617997#comment-14617997
]
stack commented on HBASE-14028:
-------------------------------
I have been playing more with this. Losing data is pretty easy to do. Trying to
find why the end of a WAL goes missing during replay; there is not enough info
to debug and it is a little tough to trace where we're at at any one time.
Trying to back fill.
> DistributedLogReplay drops edits when ITBLL 125M
> ------------------------------------------------
>
> Key: HBASE-14028
> URL: https://issues.apache.org/jira/browse/HBASE-14028
> Project: HBase
> Issue Type: Bug
> Components: Recovery
> Affects Versions: 1.2.0
> Reporter: stack
>
> Testing DLR before 1.2.0RC gets cut, we are dropping edits.
> Issue seems to be around replay into a deployed region that is on a server
> that dies before all edits have finished replaying. Logging is sparse on
> sequenceid accounting so can't tell for sure how it is happening (and if our
> now accounting by Store is messing up DLR). Digging.
> I notice also that DLR does not refresh its cache of region location on error
> -- it just keeps trying till whole WAL fails.... 8 retries...about 30
> seconds. We could do a bit of refactor and have the replay find region in new
> location if moved during DLR replay.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)