[ 
https://issues.apache.org/jira/browse/HBASE-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068823#comment-15068823
 ] 

Enis Soztutar commented on HBASE-14987:
---------------------------------------

Thanks Ted, Stephen on working on this. 

To summarize the issue and some of the discussions so far if you have not been 
following closely. The root cause of the issue is that if HBCK decides to fix 
an overlap, it will create a new region and move all the files and folders 
including the {{recovered.edits}} into the new region from the old overlapping 
regions within the range. Moving the data files is fine, however, when 
recovered.edits is moved to the new region, replaying of the compaction markers 
throw WrongRegionException. The other edits are already skipped (in 1.0+) if 
region names do not match in log split.

The replay compaction marker is used by recored.edits through regular log 
split, though distributed log replay or region replica replication for 
secondary regions (where they replay the compaction from primary).  In the log 
split case, we want to skip the edits (due to HBCK case), but secondary region 
replication we still want to throw the exception if regions do not match. 

Now, coming to the patch, instead of this: 
{code}
+                replayWALCompactionMarker(compaction, false, true, 
Long.MAX_VALUE,
+                  !checkRowWithinBoundary);
{code}
can we do this: 
{code}
+                if (checkRowWithinBoundary) {
+                  replayWALCompactionMarker(compaction, false, true, 
Long.MAX_VALUE);
+                }
{code}
Sending a boolean to replayWALCompactionMarker() which will fail everytime 
should be avoided. We should simply not call the method if that is the case. 

The new test case uses region replica replication via secondary regions, 
however, ideally we would like to test the compaction replay through 
recovered.edits which is not related to secondary replicas. 

> Compaction marker whose region name doesn't match current region's needs to 
> be handled
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-14987
>                 URL: https://issues.apache.org/jira/browse/HBASE-14987
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Stephen Yuan Jiang
>         Attachments: 14987-suggest.txt, 14987-v1.txt, 14987-v2.txt, 
> 14987-v2.txt
>
>
> One customer encountered the following error when replaying recovered edits, 
> leading to region open failure:
> {code}
> region=table1,d6b-2282-9223370590058224807-U-9856557-        
> EJ452727-16313786400171,1449616291799.fa8a526f2578eb3630bb08a4b1648f5d., 
> starting to roll back the global memstore   size.
> org.apache.hadoop.hbase.regionserver.WrongRegionException: Compaction marker 
> from WAL table_name: "table1"
> encoded_region_name: "d389c70fde9ec07971d0cfd20ef8f575"
> ...
> region_name: 
> "table1,d6b-2282-9223370590058224807-U-9856557-EJ452727-16313786400171,1449089609367.d389c70fde9ec07971d0cfd20ef8f575."
>  targetted for region d389c70fde9ec07971d0cfd20ef8f575 does not match this 
> region: {ENCODED => fa8a526f2578eb3630bb08a4b1648f5d, NAME => 
> 'table1,d6b-2282-                        
> 9223370590058224807-U-9856557-EJ452727-16313786400171,1449616291799.fa8a526f2578eb3630bb08a4b1648f5d.',
>  STARTKEY => 'd6b-2282-9223370590058224807-U-9856557-EJ452727-             
> 16313786400171', ENDKEY => 
> 'd76-2553-9223370588576178807-U-7416904-EK875822-17662180600000'}
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkTargetRegion(HRegion.java:4592)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALCompactionMarker(HRegion.java:3831)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3747)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3601)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:911)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:789)
>   at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:762)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5774)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5744)
> {code}
> This was likely caused by the following action of hbck:
> {code}
> 15/12/08 18:11:34 INFO util.HBaseFsck: [hbasefsck-pool1-t37] Moving files 
> from 
> hdfs://Zealand/hbase/data/default/table1/d389c70fde9ec07971d0cfd20ef8f575/recovered.edits
>  into     containing region 
> hdfs://Zealand/hbase/data/default/table1/fa8a526f2578eb3630bb08a4b1648f5d/recovered.edits
> {code}
> The recovered.edits for d389c70fde9ec07971d0cfd20ef8f575 contained compaction 
> marker which couldn't be replayed against fa8a526f2578eb3630bb08a4b1648f5d



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to