[ 
https://issues.apache.org/jira/browse/HBASE-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218906#comment-17218906
 ] 

Anoop Sam John commented on HBASE-25205:
----------------------------------------

Its clear now. tks.
But when skip error config is false, the region open will always fail.. Thats a 
bigger concern.. I think I raised this issue somewhere else also.(where we 
discussed abt making split to hfile as default true).   The RS which doing the 
WAL split getting down may be common case.
In case of split to recovered edits (old way), what will happen with the 
recovered edits file which is partial?  When the splitting RS dies, another RS 
picks up this WAL file split task.  Then it will cleanup the prev split attempt 
and delete those partial files? Am not sure

> Corrupted hfiles append timestamp every time the region is trying to open
> -------------------------------------------------------------------------
>
>                 Key: HBASE-25205
>                 URL: https://issues.apache.org/jira/browse/HBASE-25205
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Junhong Xu
>            Assignee: Junhong Xu
>            Priority: Major
>
> When the RS crashed, we replay WALs to generate recover edits or HFile 
> directly. If the replaying WAL RS crashed again, the file just writing to may 
> be corrupted. In some cases, we may want to move on(e.g. in the case of sink 
> to hfile as we have WAL and replaying the WAL again is OK), and move the file 
> with extra timestamp as suffix.But if the region is opened again, the 
> corrupted file can't be opened, and renamed with an extra timestamp 
> again.After some round like this, the file name will be too long to 
> rename.The log is like this:
> {code:java}
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException):
>  The maximum path component name limit of 6537855
> 8b0444c27a9d21fb0f4e4293f.1602831270772.1602831291050.1602831296855.1602831408803.1602831493989.1602831584077.1602831600838.1602831659805.1602831736374.1602831738002.1
> 602831959867.1602831979707.1602832095288.1602832103908.1602832538224.1602833079431
>  in directory /hbase/XXX/data/default/IntegrationTestBigLinkedList/aa376ec
> f026a5e63d0703384e34ec6aa/meta/recovered.hfiles is exceeded: limit=255 
> length=256
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:1230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:98)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:191)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:493)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3080)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1113)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:665)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)        
> at org.apache.hadoop.ipc.Client.call(Client.java:1504)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1435)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>         at com.sun.proxy.$Proxy17.rename(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:504)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:249)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:107)
>         at com.sun.proxy.$Proxy18.rename(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
>         at com.sun.proxy.$Proxy21.rename(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1996)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:605)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:226)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to