[ https://issues.apache.org/jira/browse/HBASE-27614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684396#comment-17684396 ]
Dong0829 edited comment on HBASE-27614 at 2/6/23 3:32 AM: ---------------------------------------------------------- Thanks [~zhangduo] for looking into this Maybe I did not explain the issue clearly, changing TTL does NOT cause the seqNum go back. The context is: # For some reason(we suspect its the WAL data loss or HFile loss during HBase migrate), the seqNum during open region was much smaller than {{seqnumDuringOpen in the meta}} # Above mismatch cause the reopen keep re-opening the same region even its already opened successfully If the "The seqNum should not go backward", then as I said, for the open procedure([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81]), should we consider the open succeeded if the openSeqNum is smaller than the one in region state? I am using hbase-2.4.13, you mean instead of the set the seqNum based on the wal and hfile, now there is a separate file to tracking the max sequence id? If yes, may I know from which version? was (Author: li0829): Thanks [~zhangduo] for looking into this Maybe I did not explain the issue clearly, changing TTL does NOT cause the seqNum go back. The context is: # For some reason(we suspect its the WAL data loss or HFile log during HBase migrate), the seqNum during open region was much smaller than {{seqnumDuringOpen in the meta}} # Above mismatch cause the reopen keep re-opening the same region even its already opened successfully If the "The seqNum should not go backward", then as I said, for the open procedure([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81]), should we consider the open succeeded if the openSeqNum is smaller than the one in region state? I am using hbase-2.4.13, you mean instead of the set the seqNum based on the wal and hfile, now there is a separate file to tracking the max sequence id? If yes, may I know from which version? > Region Reopen failure when the openNum has issue > ------------------------------------------------ > > Key: HBASE-27614 > URL: https://issues.apache.org/jira/browse/HBASE-27614 > Project: HBase > Issue Type: Bug > Reporter: Dong0829 > Assignee: Dong0829 > Priority: Major > > We faced the issue when change the TTL for the hbase table and a lot of > regions keep reopen and tons of TRSP created, after troubleshooting, we found > some logic issue for the region reopen procedure logic. > In the reopen process, it will check the seqNum to confirm if the region > reopened successfully or not. If the seqNum accident become bigger than the > current HFile and WAL (because of the data loss), there will be issue and > unnecessary loop for the region close/open > > We should be able to optimize the logic, more details > For this regionOpenedWithoutPersistingToMeta, should we just update the > OpenSeqNum when the new one is bigger than the old one? > As the region already opened, we should update the OpenSeqNum no matter its > bigger or smaller, otherwise, we should not just return WARN but failed the > open, right? > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/OpenRegionProcedure.java#L81] > > Above does matter because for the > checkReopened([https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L312]), > if the seq is smaller, the region will be returned and keep reopening. So > we should either update the logic in regionOpenedWithoutPersistingToMeta or > checkReopened to make sure the region reopen works properly if the seqNum has > issue > > > Reproduce steps: > > 1. {{{}Create a test table and put some data, for example:{}}}{{{}test{}}} > {{create 'test', 'info'}} > {{put 'test', 'fool', 'info:cat', 'test'}} > {{2. Manually update one region row for this test table in hbase:meta on the > column, for example:}} > {{put 'hbase:meta', 'test,,1673406566311.3eb4d3e0258bd06f4639a595920c7673.', > 'info:seqnumDuringOpen', "\x00\x00\x00\x00\x00\x10\x00\x05"}} > 3. Modify the table TTL : > alter 'test', \{NAME=>'info' , TTL => '63244800'}}} > > You will see the region keep reopening -- This message was sent by Atlassian Jira (v8.20.10#820010)