[
https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615977#comment-14615977
]
stack commented on HBASE-14028:
-------------------------------
bq. This -recovery-from-failure-during-recovery-from-failure thing looks quite
complicated to me.
Yes. It should work. All the pieces are there. Smile. I've done a few more
runs and it passes sometimes. Let me try and figure the hole.
> DistributedLogReplay drops edits when ITBLL 125M
> ------------------------------------------------
>
> Key: HBASE-14028
> URL: https://issues.apache.org/jira/browse/HBASE-14028
> Project: HBase
> Issue Type: Bug
> Components: Recovery
> Affects Versions: 1.2.0
> Reporter: stack
>
> Testing DLR before 1.2.0RC gets cut, we are dropping edits.
> Issue seems to be around replay into a deployed region that is on a server
> that dies before all edits have finished replaying. Logging is sparse on
> sequenceid accounting so can't tell for sure how it is happening (and if our
> now accounting by Store is messing up DLR). Digging.
> I notice also that DLR does not refresh its cache of region location on error
> -- it just keeps trying till whole WAL fails.... 8 retries...about 30
> seconds. We could do a bit of refactor and have the replay find region in new
> location if moved during DLR replay.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)