[jira] [Commented] (HBASE-20616) TruncateTableProcedure is stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state

[email protected] (Jira) Tue, 23 Feb 2021 18:00:07 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-20616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289522#comment-17289522
 ]


[email protected] commented on HBASE-20616:
----------------------------------------------------

Hi:

We are receiving the same problem on HBase 1.2.0. The master is forever in the 
re-try loop. How do we break this loop and apply the patch?

I was trying to truncate_preserve 'table'

Below are the errors:

2020-10-21 11:51:39,428 INFO org.apache.hadoop.hbase.master.HMaster: 
Client=research//10.19.25.18 truncate table
020-10-21 11:51:39,876 INFO org.apache.hadoop.hbase.MetaTableAccessor: Deleted 
[{ENCODED => 190c395e0419552552ec2472c212109b, NAME => ' '
020-10-21 12:55:49,600 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/hbase/.tmp/data/default/table_name/b533d42b2b0da96cd7f960619b8ce6f1/.regioninfo
 (inode 1164868456): File does not exist. [Lease.  Holder: 
DFSClient_NONMAPREDUCE_-316682701_1, pending creates: 3]

2020-10-21 12:55:49,626 WARN 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure: Retriable 
error trying to truncate table='' state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/hbase/.tmp/data/default/''/b533d42b2b0da96cd7f960619b8ce6f1/.regioninfo (inode 
1164868456): File does not exist. [Lease.  Holder: 
DFSClient_NONMAPREDUCE_-316682701_1, pending creates: 3]

2020-10-21 13:44:58,519 WARN 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure: Retriable 
error trying to truncate table=''state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: 
java.io.IOException: The specified region already exists on disk: 
hdfs://nameservice1/hbase/.tmp/data/default/''eef3595091cdf51af7488dca37398617
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:186)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:141)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:118)
        at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure$3.createHdfsRegions(CreateTableProcedure.java:370)
        at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:389)
        at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:363)
        at 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:114)
        at 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:47)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:498)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1061)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:856)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:809)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:495)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: The 
specified region already exists on disk: 
hdfs://nameservice1/hbase/.tmp/data/default/''/eef3595091cdf51af7488dca37398617
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180)
        ... 14 more
Caused by: java.io.IOException: The specified region already exists on disk: 
hdfs://nameservice1/hbase/.tmp/data/default/''/eef3595091cdf51af7488dca37398617
        at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:904)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6380)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)




$ hdfs dfs -ls hdfs://nameservice1/hbase/.tmp/data/default/*
Found 4 items
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 
hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/.tabledesc
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 
hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/.tmp
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 
hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/b533d42b2b0da96cd7f960619b8ce6f1
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 
hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/eef3595091cdf51af7488dca37398617




> TruncateTableProcedure is stuck in retry loop in 
> TRUNCATE_TABLE_CREATE_FS_LAYOUT state
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-20616
>                 URL: https://issues.apache.org/jira/browse/HBASE-20616
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>         Environment: HDP-2.5.3
>            Reporter: Toshihiro Suzuki
>            Assignee: Toshihiro Suzuki
>            Priority: Major
>             Fix For: 2.1.0
>
>         Attachments: 20616.master.004.patch, HBASE-20616.master.001.patch, 
> HBASE-20616.master.002.patch, HBASE-20616.master.003.patch, 
> HBASE-20616.master.004.patch
>
>
> At first, TruncateTableProcedure failed to write some files to HDFS in 
> TRUNCATE_TABLE_CREATE_FS_LAYOUT state for some reason.
> {code:java}
> 2018-05-15 08:00:25,346 WARN  [ProcedureExecutorThread-8] 
> procedure.TruncateTableProcedure: Retriable error trying to truncate 
> table=<namespace>:<table> state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
> java.io.IOException: java.util.concurrent.ExecutionException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /apps/hbase/data/.tmp/data/<namespace>/<table>/<region>/.regioninfo could 
> only be replicated to 0 nodes instead of minReplication (=1).  There are <the 
> number of DNs> datanode(s) running and no node(s) are excluded in this 
> operation.
> ...
> {code}
> But at this time, seemed like writing some files to HDFS was successful.
> And then, TruncateTableProcedure was stuck in retry loop in 
> TRUNCATE_TABLE_CREATE_FS_LAYOUT state. At this point, the following log 
> messages were shown repeatedly in the master log:
> {code:java}
> 2018-05-15 08:00:25,463 WARN  [ProcedureExecutorThread-8] 
> procedure.TruncateTableProcedure: Retriable error trying to truncate 
> table=<namespace>:<table> state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
> java.io.IOException: java.util.concurrent.ExecutionException: 
> java.io.IOException: The specified region already exists on disk: 
> hdfs://<name>/apps/hbase/data/.tmp/data/<namespace>/<table>/<region>
> ...
> {code}
> It seems like this is because TruncateTableProcedure tried to write the files 
> that were written successfully in the first try.
> I think we need to delete all the files and directories that are written 
> successfully in the previous try before retrying the 
> TRUNCATE_TABLE_CREATE_FS_LAYOUT state.
> Actually, this issue was observed in HDP-2.5.3, but I think the upstream has 
> the same issue. Also, it looks to me that CreateTableProcedure has a similar 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-20616) TruncateTableProcedure is stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state

Reply via email to