[jira] Commented: (HBASE-1025) Reconstruction log playback has no bounds on memory used

HBase Review Board (JIRA) Mon, 14 Jun 2010 14:19:37 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878749#action_12878749
 ]


HBase Review Board commented on HBASE-1025:
-------------------------------------------

Message from: "Todd Lipcon" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/179/#review216
-----------------------------------------------------------


Regarding rewriting recovered edits back to the WAL, I'm afraid we may have an 
issue with the following situation:
- RS A writes edits 1-100 to log
- RS A fails
- RS B starts recovering log, rewrites edits 1-50
- RS B fails while recovering
- RS C starts up. It *should* replay 1-100. Instead I fear it might only replay 
1-50, or replay 1-100 followed by 1-50, either of which would be incorrect.

We should add a test case where the RS fails in the middle of recovering a 
prior failure.


Also, replayRecoveredEdits returns the highest seqid from the logs. Is it 
possible for storeFiles to have a higher seqid than what's in the log in any 
situation? We should add some kind of assert perhaps.


src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment973>

    this returns -1 if there is nothing to replay. Shouldn't it return the 
maxSeqId + 1 perhaps?
    
    Do we have a test which makes a region, writes some data, closes the 
region, opens the region, writes some data, then recovers logs?



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment972>

    we used to print the sequence ID after the increment, now we print before 
the increment. Maybe change log to "sequenceid for next edit will be NNN"



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment975>

    add some javadoc - I think important to note that minSeqId is *exclusive* - 
it's a little unintuitive. Perhaps makes more sense to pass in 
maxSeqIdFromStoreFiles+1 here, and make minSeqId be "minimum sequence id, 
inclusive, to apply"



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment976>

    should probably use LOG.error for these



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment974>

    I agree, i don't think the splitter would ever write a partial one. We 
should treat an IOE here as a real error



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment977>

    when would this happen? deserves a comment I think



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment978>

    can we rename maxSeqIdInLog to currentEditSeqId or something? since that's 
what we're really talking about here. Also reversing this inequality to: if 
(currentEditSeqId < storeMaxSeqId>) I think it would read more clearly



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment979>

    I think this will crash when the region contains stores that were the 
result of an HFOF bulk load (they have no sequence id)



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment980>

    maybe rename to restoreEdit() or something to be clear that this is during 
restoration only?



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.hbase.org/r/179/#comment981>

    possibly more efficient:
    Collections.singletonMap(kv.getFamily(), Collections.singletonList(kv))
    
    though there may be a comparator issue?



src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<http://review.hbase.org/r/179/#comment970>

    comment is now out of date, right?



src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
<http://review.hbase.org/r/179/#comment971>

    maybe just Preconditions.checkArgument here so it throws if you try to pass 
a non-DFS?


- Todd





> Reconstruction log playback has no bounds on memory used
> --------------------------------------------------------
>
>                 Key: HBASE-1025
>                 URL: https://issues.apache.org/jira/browse/HBASE-1025
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.21.0
>
>         Attachments: 1025-v2.txt, 1025-v3.txt, 1025-v5.patch, 1025.txt
>
>
> Makes a TreeMap and just keeps adding edits without regard for size of edits 
> applied; could cause OOME (I've not seen a definitive case though have seen 
> an OOME around time of a reconstructionlog replay -- perhaps this the straw 
> that broke the fleas antlers?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1025) Reconstruction log playback has no bounds on memory used

Reply via email to