[
https://issues.apache.org/jira/browse/HBASE-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218847#comment-17218847
]
Anoop Sam John commented on HBASE-25205:
----------------------------------------
bq. If the replaying WAL RS crashed
u mean the RS which is doing WAL split?
bq. the file just writing to may be corrupted
u mean in case of split wal to HFile directly? in case of split and create
recovered edits file, we will throw away the old incomplete file and create new
right?
Can u pls explain more in which case we have an issue.. Sorry am not able to
follow fully
> Corrupted hfiles append timestamp every time the region is trying to open
> -------------------------------------------------------------------------
>
> Key: HBASE-25205
> URL: https://issues.apache.org/jira/browse/HBASE-25205
> Project: HBase
> Issue Type: Bug
> Reporter: Junhong Xu
> Assignee: Junhong Xu
> Priority: Major
>
> When the RS crashed, we replay WALs to generate recover edits or HFile
> directly. If the replaying WAL RS crashed again, the file just writing to may
> be corrupted. In some cases, we may want to move on(e.g. in the case of sink
> to hfile as we have WAL and replaying the WAL again is OK), and move the file
> with extra timestamp as suffix.But if the region is opened again, the
> corrupted file can't be opened, and renamed with an extra timestamp
> again.After some round like this, the file name will be too long to
> rename.The log is like this:
> {code:java}
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException):
> The maximum path component name limit of 6537855
> 8b0444c27a9d21fb0f4e4293f.1602831270772.1602831291050.1602831296855.1602831408803.1602831493989.1602831584077.1602831600838.1602831659805.1602831736374.1602831738002.1
> 602831959867.1602831979707.1602832095288.1602832103908.1602832538224.1602833079431
> in directory /hbase/XXX/data/default/IntegrationTestBigLinkedList/aa376ec
> f026a5e63d0703384e34ec6aa/meta/recovered.hfiles is exceeded: limit=255
> length=256
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:1230)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:98)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:191)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:493)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3080)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1113)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:665)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
> at org.apache.hadoop.ipc.Client.call(Client.java:1504)
> at org.apache.hadoop.ipc.Client.call(Client.java:1435)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy17.rename(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:504)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:249)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:107)
> at com.sun.proxy.$Proxy18.rename(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
> at com.sun.proxy.$Proxy21.rename(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1996)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:605)
> at
> org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:226)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)