[ 
https://issues.apache.org/jira/browse/HBASE-27614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684396#comment-17684396
 ] 

Dong0829 edited comment on HBASE-27614 at 2/6/23 3:32 AM:
----------------------------------------------------------

Thanks [~zhangduo] for looking into this

 

Maybe I did not explain the issue clearly, changing TTL does NOT cause the 
seqNum go back. The context is:
 # For some reason(we suspect its the WAL data loss or HFile loss during HBase 
migrate), the seqNum during open region was much smaller than 
{{seqnumDuringOpen in the meta}}
 # Above mismatch cause the reopen keep re-opening the same region even its 
already opened successfully

 

If the "The seqNum should not go backward", then as I said, for the open 
procedure([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81]),
 should we consider the open succeeded if the openSeqNum is smaller than the 
one in region state?

 

I am using hbase-2.4.13, you mean instead of the set the seqNum based on the 
wal and hfile, now there is a separate file to tracking the max sequence id? If 
yes, may I know from which version?


was (Author: li0829):
Thanks [~zhangduo] for looking into this

 

Maybe I did not explain the issue clearly, changing TTL does NOT cause the 
seqNum go back. The context is:
 # For some reason(we suspect its the WAL data loss or HFile log during HBase 
migrate), the seqNum during open region was much smaller than 
{{seqnumDuringOpen in the meta}}
 # Above mismatch cause the reopen keep re-opening the same region even its 
already opened successfully

 

If the "The seqNum should not go backward", then as I said, for the open 
procedure([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81]),
 should we consider the open succeeded if the openSeqNum is smaller than the 
one in region state?

 

I am using hbase-2.4.13, you mean instead of the set the seqNum based on the 
wal and hfile, now there is a separate file to tracking the max sequence id? If 
yes, may I know from which version?

> Region Reopen failure when the openNum has issue
> ------------------------------------------------
>
>                 Key: HBASE-27614
>                 URL: https://issues.apache.org/jira/browse/HBASE-27614
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Dong0829
>            Assignee: Dong0829
>            Priority: Major
>
> We faced the issue when change the TTL for the hbase table and a lot of 
> regions keep reopen and tons of TRSP created, after troubleshooting, we found 
> some logic issue for the region reopen procedure logic.
> In the reopen process, it will check the seqNum to confirm if the region 
> reopened successfully or not. If the seqNum accident become bigger than the 
> current HFile and WAL (because of the data loss), there will be issue and 
> unnecessary loop for the region close/open
>  
> We should be able to optimize the logic, more details
> For this regionOpenedWithoutPersistingToMeta, should we just update the 
> OpenSeqNum when the new one is bigger than the old one?
> As the region already opened, we should update the OpenSeqNum no matter its 
> bigger or smaller, otherwise, we should not just return WARN but failed the 
> open, right?
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81]
>  
> Above does matter because for the 
> checkReopened([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L312]),
>  if the seq is smaller, the region will be returned and keep reopening.  So 
> we should either update the logic in regionOpenedWithoutPersistingToMeta or 
> checkReopened to make sure the region reopen works properly if the seqNum has 
> issue
>  
>  
> Reproduce steps:
>  
> 1. {{{}Create a test table and put some data, for example:{}}}{{{}test{}}}
> {{create 'test', 'info'}}
> {{put 'test', 'fool', 'info:cat', 'test'}}
> {{2. Manually update one region row for this test table in hbase:meta on the 
> column, for example:}}
> {{put 'hbase:meta', 'test,,1673406566311.3eb4d3e0258bd06f4639a595920c7673.', 
> 'info:seqnumDuringOpen', "\x00\x00\x00\x00\x00\x10\x00\x05"}}
> 3. Modify the table TTL :
> alter 'test', \{NAME=>'info' , TTL => '63244800'}}}
>  
> You will see the region keep reopening



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to