[ https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615868#comment-14615868 ]
Vladimir Rodionov commented on HBASE-14028: ------------------------------------------- This -recovery-from-failure-during-recovery-from-failure thing looks quite complicated to me. I am working on HBASE-7912 and one of the improvements which is on the list is WALPlayer into HFiles followed by a bulk load. Pounding HBase with millions of puts is not the right approach. > DistributedLogReplay drops edits when ITBLL 125M > ------------------------------------------------ > > Key: HBASE-14028 > URL: https://issues.apache.org/jira/browse/HBASE-14028 > Project: HBase > Issue Type: Bug > Components: Recovery > Affects Versions: 1.2.0 > Reporter: stack > > Testing DLR before 1.2.0RC gets cut, we are dropping edits. > Issue seems to be around replay into a deployed region that is on a server > that dies before all edits have finished replaying. Logging is sparse on > sequenceid accounting so can't tell for sure how it is happening (and if our > now accounting by Store is messing up DLR). Digging. > I notice also that DLR does not refresh its cache of region location on error > -- it just keeps trying till whole WAL fails.... 8 retries...about 30 > seconds. We could do a bit of refactor and have the replay find region in new > location if moved during DLR replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)