[jira] [Commented] (HBASE-14028) DistributedLogReplay drops edits when ITBLL 125M

stack (JIRA) Mon, 06 Jul 2015 17:35:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615977#comment-14615977
 ]


stack commented on HBASE-14028:
-------------------------------

bq. This -recovery-from-failure-during-recovery-from-failure thing looks quite 
complicated to me. 

Yes. It should work. All the pieces are there.  Smile.  I've done a few more 
runs and it passes sometimes.  Let me try and figure the hole.

> DistributedLogReplay drops edits when ITBLL 125M
> ------------------------------------------------
>
>                 Key: HBASE-14028
>                 URL: https://issues.apache.org/jira/browse/HBASE-14028
>             Project: HBase
>          Issue Type: Bug
>          Components: Recovery
>    Affects Versions: 1.2.0
>            Reporter: stack
>
> Testing DLR before 1.2.0RC gets cut, we are dropping edits.
> Issue seems to be around replay into a deployed region that is on a server 
> that dies before all edits have finished replaying. Logging is sparse on 
> sequenceid accounting so can't tell for sure how it is happening (and if our 
> now accounting by Store is messing up DLR). Digging.
> I notice also that DLR does not refresh its cache of region location on error 
> -- it just keeps trying till whole WAL fails.... 8 retries...about 30 
> seconds. We could do a bit of refactor and have the replay find region in new 
> location if moved during DLR replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14028) DistributedLogReplay drops edits when ITBLL 125M

Reply via email to