[ 
https://issues.apache.org/jira/browse/HBASE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064925#comment-13064925
 ] 

[email protected] commented on HBASE-3872:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1097/#review1050
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
<https://reviews.apache.org/r/1097/#comment2123>

    Good one.  You +1 on committing?


- Michael


On 2011-07-13 04:06:54, Michael Stack wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1097/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-07-13 04:06:54)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Fix is two-fold.
bq.  
bq.  1. Fix catalogjanitor so that it does not remove parent if daughter 
regions are not present.
bq.  2. Fix the SplitTransaction so that if we go past the point of no return, 
DO NOT REMOVE daughter regions as part of cleanup; leave them in place.  
Because we went past PONR, we are going to abort and server shutdown processing 
has what it needs to do fixup.
bq.  
bq.  
bq.  This addresses bug hbase-3872.
bq.      https://issues.apache.org/jira/browse/hbase-3872
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 
7082342 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
977115d 
bq.    
src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java 
c402b87 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 
16e970e 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 
ab88766 
bq.    src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 
49333ac 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 
56fc0b8 
bq.    
src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 
98a7ee7 
bq.    src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java 4a35d09 
bq.  
bq.  Diff: https://reviews.apache.org/r/1097/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added two new unit tests.  One to test we do not remove parent if no 
daughter region in fs and another to test that the right answer pops out of the 
call to rollback if we go past PONR (and that the daughter regions are still in 
place after rollback).
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



> Hole in split transaction rollback; edits to .META. need to be rolled back 
> even if it seems like they didn't make it
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3872
>                 URL: https://issues.apache.org/jira/browse/HBASE-3872
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: 3872-v2.txt, 3872.txt
>
>
> Saw this interesting one on a cluster of ours.  The cluster was configured 
> with too few handlers so lots of the phenomeneon where actions were queued 
> but then by the time they got into the server and tried respond to the 
> client, the client had disconnected because of the timeout of 60 seconds.  
> Well, the meta edits for a split were queued at the regionserver carrying 
> .META. and by the time it went to write back, the client had gone (the first 
> insert of parent offline with daughter regions added as info:splitA and 
> info:splitB).  The client presumed the edits failed and 'successfully' rolled 
> back the transaction (failing to undo .META. edits thinking they didn't go 
> through).
> A few minutes later the .META. scanner on master runs.  It sees 'no 
> references' in daughters -- the daughters had been cleaned up as part of the 
> split transaction rollback -- so it thinks its safe to delete the parent.
> Two things:
> + Tighten up check in master... need to check daughter region at least exists 
> and possibly the daughter region has an entry in .META.
> + Dependent on the edit that fails, schedule rollback edits though it will 
> seem like they didn't go through.
> This is pretty critical one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to