[ 
https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615868#comment-14615868
 ] 

Vladimir Rodionov commented on HBASE-14028:
-------------------------------------------

This -recovery-from-failure-during-recovery-from-failure thing looks quite 
complicated to me. I am working on HBASE-7912 and one of the improvements which 
is on the list is WALPlayer into HFiles followed by a bulk load. Pounding HBase 
with millions of puts is not the right approach.

> DistributedLogReplay drops edits when ITBLL 125M
> ------------------------------------------------
>
>                 Key: HBASE-14028
>                 URL: https://issues.apache.org/jira/browse/HBASE-14028
>             Project: HBase
>          Issue Type: Bug
>          Components: Recovery
>    Affects Versions: 1.2.0
>            Reporter: stack
>
> Testing DLR before 1.2.0RC gets cut, we are dropping edits.
> Issue seems to be around replay into a deployed region that is on a server 
> that dies before all edits have finished replaying. Logging is sparse on 
> sequenceid accounting so can't tell for sure how it is happening (and if our 
> now accounting by Store is messing up DLR). Digging.
> I notice also that DLR does not refresh its cache of region location on error 
> -- it just keeps trying till whole WAL fails.... 8 retries...about 30 
> seconds. We could do a bit of refactor and have the replay find region in new 
> location if moved during DLR replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to