[ 
https://issues.apache.org/jira/browse/HBASE-29282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949683#comment-17949683
 ] 

Duo Zhang commented on HBASE-29282:
-----------------------------------

Some findings below.

I used to hfile tool to print out the related meta entries.

There are CLOSED state in hbase:meta for region 
6a98dc86a491041b8d3ac584ac73c0a0, which should have been merged and removed. 
And there is a delete family entry in another hfile but should be added by 
catalog janitor, which has a much later timestap.
{noformat}
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626716.6a98dc86a491041b8d3ac584ac73c0a0./info:regioninfo/1746029312379/Put/vlen=74/seqid=3530
 V: 
PBUF\x08\x9C\x86\xAA\xBB\xE82\x12'\x0A\x07default\x12\x1CIntegrationTestBigLinkedList\x1A\x08\x99\x99\x99\x99\x99\x99\x99\x99"\x04\xA2!RV(\x000\x008\x00
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626716.6a98dc86a491041b8d3ac584ac73c0a0./info:state/1746029312379/Put/vlen=6/seqid=3530
 V: CLOSED
{noformat}


These are the entries for the new region, 24435a6eefc045cf36ddff9a30409ff1. The 
first two entries like 'info:merge0000' should be written together with the 
delete of parent regions.
{noformat}
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:merge0000/1746029312352/Put/vlen=74/seqid=3531
 V: 
PBUF\x08\x9C\x86\xAA\xBB\xE82\x12'\x0A\x07default\x12\x1CIntegrationTestBigLinkedList\x1A\x08\x99\x99\x99\x99\x99\x99\x99\x99"\x04\xA2!RV(\x000\x008\x00
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:merge0001/1746029312352/Put/vlen=74/seqid=3531
 V: 
PBUF\x08\x9C\x86\xAA\xBB\xE82\x12'\x0A\x07default\x12\x1CIntegrationTestBigLinkedList\x1A\x04\xA2!RV"\x08\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xAA(\x000\x008\x00
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:regioninfo/1746032335156/Put/vlen=78/seqid=6535
 V: 
PBUF\x08\x9D\x86\xAA\xBB\xE82\x12'\x0A\x07default\x12\x1CIntegrationTestBigLinkedList\x1A\x08\x99\x99\x99\x99\x99\x99\x99\x99"\x08\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xAA(\x000\x008\x00
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:regioninfo/1746032334918/Put/vlen=78/seqid=6517
 V: 
PBUF\x08\x9D\x86\xAA\xBB\xE82\x12'\x0A\x07default\x12\x1CIntegrationTestBigLinkedList\x1A\x08\x99\x99\x99\x99\x99\x99\x99\x99"\x08\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xAA(\x000\x008\x00
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:regioninfo/1746032334774/Put/vlen=78/seqid=6484
 V: 
PBUF\x08\x9D\x86\xAA\xBB\xE82\x12'\x0A\x07default\x12\x1CIntegrationTestBigLinkedList\x1A\x08\x99\x99\x99\x99\x99\x99\x99\x99"\x08\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xAA(\x000\x008\x00
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:seqnumDuringOpen/1746032335156/Put/vlen=8/seqid=6535
 V: \x00\x00\x00\x00\x00\x03\xA4n
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:seqnumDuringOpen/1746032245316/Put/vlen=8/seqid=6393
 V: \x00\x00\x00\x00\x00\x03\xA4k
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:seqnumDuringOpen/1746032215791/Put/vlen=8/seqid=6262
 V: \x00\x00\x00\x00\x00\x03\xA4h
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:server/1746032335156/Put/vlen=12/seqid=6535
 V: data01:16020
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:server/1746032245316/Put/vlen=12/seqid=6393
 V: data01:16020
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:server/1746032215791/Put/vlen=12/seqid=6262
 V: data01:16020
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:serverstartcode/1746032335156/Put/vlen=8/seqid=6535
 V: \x00\x00\x01\x96\x87\xA1"\xA8
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:serverstartcode/1746032245316/Put/vlen=8/seqid=6393
 V: \x00\x00\x01\x96\x87\xA1"\xA8
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:serverstartcode/1746032215791/Put/vlen=8/seqid=6262
 V: \x00\x00\x01\x96\x87\xA1"\xA8
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:sn/1746032334918/Put/vlen=26/seqid=6517
 V: data01,16020,1746032206504
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:sn/1746032334595/Put/vlen=26/seqid=6469
 V: data01,16020,1746032206504
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:sn/1746032245000/Put/vlen=26/seqid=6362
 V: data01,16020,1746032206504
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:state/1746032335156/Put/vlen=4/seqid=6535
 V: OPEN
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:state/1746032334918/Put/vlen=7/seqid=6517
 V: OPENING
K: 
IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./info:state/1746032334774/Put/vlen=6/seqid=6484
 V: CLOSED
{noformat}

The sequence ids are correct, the CLOSED state of 
6a98dc86a491041b8d3ac584ac73c0a0 has seq id 3530, while the entries for new 
region have sequence ids 3531. But the problem is the timestamp, the entries 
for new region should have a greater timestamp, but actually, the timestamp for 
CLOSED state of 6a98dc86a491041b8d3ac584ac73c0a0 is 1746029312379, but the 
timestamp for entries of 24435a6eefc045cf36ddff9a30409f is 1746029312352.

I think this is the problem that why the deletion of parent regions does not 
help, since it has a smaller timestamp...

Let dig more on why this could happen...

> Regions are left in CLOSED state after merging
> ----------------------------------------------
>
>                 Key: HBASE-29282
>                 URL: https://issues.apache.org/jira/browse/HBASE-29282
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2, Region Assignment
>            Reporter: Duo Zhang
>            Priority: Major
>
> When running ITBLL, some regions are left in CLOSED state for a long time and 
> finally were cleaned up by CatalogJanitor.
> After checking, the regions are merged, which should have been removed in 
> hbase:meta, but seems they were still present in hbase:meta table with CLOSED 
> state.
> Need to dig more.
> {noformat}
> 2025-05-01T00:08:32,903 INFO  [PEWorker-15] procedure2.ProcedureExecutor: 
> Finished pid=3512, state=SUCCESS, hasLock=false; MergeTableRegionsProcedure 
> table=IntegrationTestBigLinkedList, 
> regions=[6a98dc86a491041b8d3ac584ac73c0a0, c9f07f77792feb0d8a845d6d9751f048], 
> force=false in 734 msec
> 2025-05-01T00:11:26,333 WARN  [master/meta02:16000.Chore.1] 
> janitor.CatalogJanitor: 
> overlap=IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626716.6a98dc86a491041b8d3ac584ac73c0a0./IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1.,
>  
> overlap=IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./IntegrationTestBigLinkedList,\xA2!RV,1746028626716.c9f07f77792feb0d8a845d6d9751f048.
> 2025-05-01T00:41:40,856 WARN  [master/meta02:16000.Chore.1] 
> janitor.CatalogJanitor: 283c738f170f361157b470868f6ad89., 
> overlap=IntegrationTestBigLinkedList,\x91\x10\xA3\x07\x03\xAC\xC7\xC3\xCCY\xAE\xE4!1\xD1i,1746029042178.815020ca73a2679bc0c0a298e4dddfda./IntegrationTestBigLinkedList,\x91\x10\xA3\x07\x03\xAC\xC7\xC3\xCCY\xAE\xE4!1\xD1i,1746029042179.278a2eeee359488f859ac5334ee3cde0.,
>  
> overlap=IntegrationTestBigLinkedList,\x91\x10\xA3\x07\x03\xAC\xC7\xC3\xCCY\xAE\xE4!1\xD1i,1746029042179.278a2eeee359488f859ac5334ee3cde0./IntegrationTestBigLinkedList,\x95U\x0D9}\xAB\xE1\x98\x80w\xED\xA7+\xF9\xA4\xED,1746029042178.b64120d20856552cd7d154b63bd2ce81.,
>  
> overlap=IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626716.6a98dc86a491041b8d3ac584ac73c0a0./IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1.,
>  
> overlap=IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626717.24435a6eefc045cf36ddff9a30409ff1./IntegrationTestBigLinkedList,\xA2!RV,1746028626716.c9f07f77792feb0d8a845d6d9751f048.
> 2025-05-01T00:42:00,853 INFO  [PEWorker-12] procedure.FlushRegionProcedure: 
> State of region {ENCODED => 6a98dc86a491041b8d3ac584ac73c0a0, NAME => 
> 'IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626716.6a98dc86a491041b8d3ac584ac73c0a0.',
>  STARTKEY => '\x99\x99\x99\x99\x99\x99\x99\x99', ENDKEY => '\xA2!RV'} is not 
> OPEN or in transition. Skip pid=5810, ppid=5789, state=RUNNABLE, 
> hasLock=true; org.apache.hadoop.hbase.master.procedure.FlushRegionProcedure 
> ...
> 2025-05-01T00:44:32,339 INFO  [PEWorker-3] 
> procedure.MasterProcedureScheduler: Took xlock for pid=5964, ppid=5943, 
> state=RUNNABLE, hasLock=false; SnapshotRegionProcedure 
> 6a98dc86a491041b8d3ac584ac73c0a0
> 2025-05-01T00:44:32,340 WARN  [PEWorker-3] procedure.SnapshotRegionProcedure: 
> pid=5964, ppid=5943, state=RUNNABLE, hasLock=true; SnapshotRegionProcedure 
> 6a98dc86a491041b8d3ac584ac73c0a0 can not run currently because region state 
> of 
> IntegrationTestBigLinkedList,\x99\x99\x99\x99\x99\x99\x99\x99,1746028626716.6a98dc86a491041b8d3ac584ac73c0a0.
>  is CLOSED, wait 1000 ms to retry
> {noformat}
> {noformat}
> 2025-05-01 00:27:59,824 WARN [RPCClient-NioEventLoopGroup-1-2] 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator: Failed to locate 
> region in 'IntegrationTestBigLinkedList', 
> row='\xA6\x8B\x9E\xC1\xA98&K}g+7N/\xA1\x05', locateType=CURRENT
> org.apache.hadoop.hbase.HBaseIOException: No location found for 
> 'IntegrationTestBigLinkedList', row='\xA6\x8B\x9E\xC1\xA98&K}g+7N/\xA1\x05', 
> locateType=CURRENT
>       at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.onScanNext(AsyncNonMetaRegionLocator.java:322)
>       at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator$1.onNext(AsyncNonMetaRegionLocator.java:437)
>       at 
> org.apache.hadoop.hbase.client.AsyncScanSingleRegionRpcRetryingCaller.onComplete(AsyncScanSingleRegionRpcRetryingCaller.java:535)
>       at 
> org.apache.hadoop.hbase.client.AsyncScanSingleRegionRpcRetryingCaller.start(AsyncScanSingleRegionRpcRetryingCaller.java:636)
>       at 
> org.apache.hadoop.hbase.client.AsyncRpcRetryingCallerFactory$ScanSingleRegionCallerBuilder.start(AsyncRpcRetryingCallerFactory.java:322)
>       at 
> org.apache.hadoop.hbase.client.AsyncClientScanner.startScan(AsyncClientScanner.java:208)
>       at 
> org.apache.hadoop.hbase.client.AsyncClientScanner.lambda$openScanner$2(AsyncClientScanner.java:268)
>       at 
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:71)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>       at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
>       at 
> org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.lambda$call$4(AsyncSingleRequestRpcRetryingCaller.java:92)
>       at 
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:71)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>       at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
>       at 
> org.apache.hadoop.hbase.client.AsyncClientScanner.lambda$callOpenScanner$0(AsyncClientScanner.java:187)
>       at 
> org.apache.hbase.thirdparty.com.google.protobuf.RpcUtil$1.run(RpcUtil.java:56)
>       at 
> org.apache.hbase.thirdparty.com.google.protobuf.RpcUtil$1.run(RpcUtil.java:47)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:400)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:430)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
>       at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:117)
>       at org.apache.hadoop.hbase.ipc.Call.setResponse(Call.java:149)
>       at 
> org.apache.hadoop.hbase.ipc.RpcConnection.finishCall(RpcConnection.java:396)
>       at 
> org.apache.hadoop.hbase.ipc.RpcConnection.readResponse(RpcConnection.java:461)
>       at 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:125)
>       at 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:140)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>       at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
>       at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>       at 
> org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:289)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
>       at 
> org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.base/java.lang.Thread.run(Thread.java:840)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to