[ 
https://issues.apache.org/jira/browse/HBASE-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2727:
-------------------------

    Attachment: 2727-v4.txt

This passes all tests.  Still need to add tests that crash a RS that has a R 
that is replaying edits from a split to ensure we do not lose edits when the R 
is opened subsequently.

After study, the fix for HBASE-1025, "Reconstruction log playback has no bounds 
on memory used" turns out to be not so smart.  It played recovered edits using 
the 'normal' Region put/delete paths with edits applied to the WAL.  Notion was 
that should we need to flush -- because of global memory pressure or because a 
region was in excess of configure memstore size -- then we'd just use the 
default flush mechanism.  Well, this won't work should we crash during a 
replay.  The default flush uses the regionserver/HLog sequenceid.  When 
replaying recovered edits, the effected region is not yet online; its 
contribution to the regionserver/hlog sequenceid has not yet been made.  
Therefore, the current regionserver/hlog sequenceid could mess us up.  If it is 
far in excess of the recovering regions sequenceid and we crash during 
recovery, then on next replay, we'll skip all edits.

This patch does replay of recovered edits all in the scope of the recovering 
region; flushes, if they have to happen, are done using sequenceids that make 
sense in the context of this region only.  We don't use the regionserver/hlog 
sequenceid.  Should we crash during recovery, we'll go through same recovery 
again w/ initial sequenceid gotten from storefiles and this regions recovered 
edits rather than from hlog/regionserver.



> Splits writing one file only is untenable; need dir of recovered edits 
> ordered by sequenceid.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2727
>                 URL: https://issues.apache.org/jira/browse/HBASE-2727
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>         Attachments: 2727-v2.txt, 2727-v4.txt
>
>
> This issue comes of tlipcon doing a bit of human unit testing.  His 
> speculation is:
> Let a region X deploy to server A.  Server A opens the region, then closes it.
> Let region X now deploy to server B.  Server B now crashes.
> Both server A and server B now have edits for region X in their WALs.
> The processing of server crashes is currently sequential. 
> If server A crashes before server B, server A will write out a file of 
> recovered edits for region X but region X was not deployed on server A so, 
> the file will just sit there unused.  The processing of server B crash will 
> overwrite the recovered edits file written by the split of server A wal.  
> This is ok.
> But if somehow, server B processing is done before server A's, then 
> interesting issues will likely arise; in the main, there is danger that the 
> server B's recovered edits could be overwritten.
> Another issue comes up in the review of hbase-1025.  During the replay of 
> edits on region deploy, if the hosting regionserver crashes before we have 
> processed all of the recovered edits, we could lose some (the recovery of the 
> regionserver that is replaying the edits could overwrite the log of edits 
> only partially replayed).
> Discussing up on IRC, whats needed is a directory of edits to replay ordered 
> by sequenceid.  On recovery, we play the oldest through to the newest 
> removing the edits only on successfully replay.
> Making blocker on 0.21 since this is a correctness issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to