[ https://issues.apache.org/jira/browse/HBASE-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218886#comment-17218886 ]
Junhong Xu edited comment on HBASE-25205 at 10/22/20, 9:14 AM: --------------------------------------------------------------- {quote}u mean the RS which is doing WAL split?{quote} Yeah It will be triggered when all of the following conditions fit: 1. enable split to HFile directly 2. enable the flag of skipping corrupted file: hbase.hregion.edits.replay.skip.errors=true 3. the region is opened many times 4. there are corrupted files If we enable the flag of skipping corrupted file, we will append timestamp as the file suffix for the corrupted ones. For normal hfiles we just rename them to the right position. These corrupted ones will append its filename with a timestamp as suffix every time the region opens. So when the region open many times, the file name will be too long to rename( the file name length limitation is 256 in the log above). For edit logs, we will delete them after open success. But if we failed to open many times before deleting them, the logic is the same as above? And the case we encounter is when sinking to HFile directly , the case of edit logs may be much rare. was (Author: joseph295): {quote}u mean the RS which is doing WAL split?{quote} Yeah It will be triggered when all of the following conditions fit: 1. enable split to HFile directly 2. enable the flag of skipping corrupted file: hbase.hregion.edits.replay.skip.errors=true 3. the region is opened many times 4. there is corrupted files If we enable the flag of skipping corrupted file, we will append timestamp as the file suffix for the corrupted ones. For normal hfiles we just rename them to the right position. These corrupted ones will append its filename with a timestamp as suffix every time the region opens. So when the region open many times, the file name will be too long to rename( the file name length limitation is 256 in the log above). For edit logs, we will delete them after open success. But if we failed to open many times before deleting them, the logic is the same as above? And the case we encounter is when sinking to HFile directly , the case of edit logs may be much rare. > Corrupted hfiles append timestamp every time the region is trying to open > ------------------------------------------------------------------------- > > Key: HBASE-25205 > URL: https://issues.apache.org/jira/browse/HBASE-25205 > Project: HBase > Issue Type: Bug > Reporter: Junhong Xu > Assignee: Junhong Xu > Priority: Major > > When the RS crashed, we replay WALs to generate recover edits or HFile > directly. If the replaying WAL RS crashed again, the file just writing to may > be corrupted. In some cases, we may want to move on(e.g. in the case of sink > to hfile as we have WAL and replaying the WAL again is OK), and move the file > with extra timestamp as suffix.But if the region is opened again, the > corrupted file can't be opened, and renamed with an extra timestamp > again.After some round like this, the file name will be too long to > rename.The log is like this: > {code:java} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException): > The maximum path component name limit of 6537855 > 8b0444c27a9d21fb0f4e4293f.1602831270772.1602831291050.1602831296855.1602831408803.1602831493989.1602831584077.1602831600838.1602831659805.1602831736374.1602831738002.1 > 602831959867.1602831979707.1602832095288.1602832103908.1602832538224.1602833079431 > in directory /hbase/XXX/data/default/IntegrationTestBigLinkedList/aa376ec > f026a5e63d0703384e34ec6aa/meta/recovered.hfiles is exceeded: limit=255 > length=256 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:1230) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:98) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:191) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:493) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3080) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1113) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:665) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742) > at org.apache.hadoop.ipc.Client.call(Client.java:1504) > at org.apache.hadoop.ipc.Client.call(Client.java:1435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy17.rename(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:504) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:249) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:107) > at com.sun.proxy.$Proxy18.rename(Unknown Source) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372) > at com.sun.proxy.$Proxy21.rename(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1996) > at > org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:605) > at > org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:226) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)