[
https://issues.apache.org/jira/browse/HDFS-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386458#comment-16386458
]
Wei-Chiu Chuang commented on HDFS-10992:
----------------------------------------
I am wondering if a different bug caused the same error as HDFS-10763. We found
the same error on a CDH version with HDFS-10763 patched (CDH 5.13.1). No full
logs as NN log was rotated.
The client application is Flume. The NameNode had a failover. The version of
Flume we have would abort when a close() times out due to the failover (it does
not retry), and then invokes recoverLease().
But I think, since the reporter is on a version with no HDFS-10763, we can
close this out. I'll file a new one when I gather more information.
> file is under construction but no leases found
> ----------------------------------------------
>
> Key: HDFS-10992
> URL: https://issues.apache.org/jira/browse/HDFS-10992
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.1
> Environment: hortonworks 2.3 build 2557. 10 Datanodes , 2 NameNode in
> auto failover
> Reporter: Chernishev Aleksandr
> Priority: Major
>
> On hdfs after recording a small number of files (at least 1000) the size
> (150Mb - 1,6Gb) found 13 damaged files with incomplete last block.
> hadoop fsck /hadoop/files/load_tarifer-zf-4_20160902165521521.csv
> -openforwrite -files -blocks -locations
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
> Connecting to namenode via
> http://hadoop-hdfs:50070/fsck?ugi=hdfs&openforwrite=1&files=1&blocks=1&locations=1&path=%2Fstaging%2Flanding%2Fstream%2Fitc_dwh%2Ffiles%2Fload_tarifer-zf-4_20160902165521521.csv
> FSCK started by hdfs (auth:SIMPLE) from /10.0.0.178 for path
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv at Mon Oct 10 17:12:25
> MSK 2016
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv 920596121 bytes, 7
> block(s), OPENFORWRITE: MISSING 1 blocks of total size 115289753 B
> 0. BP-1552885336-10.0.0.178-1446159880991:blk_1084952841_17798971
> len=134217728 repl=4
> [DatanodeInfoWithStorage[10.0.0.188:50010,DS-9ba44a76-113a-43ac-87dc-46aa97ba3267,DISK],
>
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK],
>
> DatanodeInfoWithStorage[10.0.0.184:50010,DS-ec462491-6766-490a-a92f-38e9bb3be5ce,DISK],
>
> DatanodeInfoWithStorage[10.0.0.182:50010,DS-cef46399-bb70-4f1a-ac55-d71c7e820c29,DISK]]
> 1. BP-1552885336-10.0.0.178-1446159880991:blk_1084952850_17799207
> len=134217728 repl=3
> [DatanodeInfoWithStorage[10.0.0.184:50010,DS-412769e0-0ec2-48d3-b644-b08a516b1c2c,DISK],
>
> DatanodeInfoWithStorage[10.0.0.181:50010,DS-97388b2f-c542-417d-ab06-c8d81b94fa9d,DISK],
>
> DatanodeInfoWithStorage[10.0.0.187:50010,DS-e7a11951-4315-4425-a88b-a9f6429cc058,DISK]]
> 2. BP-1552885336-10.0.0.178-1446159880991:blk_1084952857_17799489
> len=134217728 repl=3
> [DatanodeInfoWithStorage[10.0.0.184:50010,DS-7a08c597-b0f4-46eb-9916-f028efac66d7,DISK],
>
> DatanodeInfoWithStorage[10.0.0.180:50010,DS-fa6a4630-1626-43d8-9988-955a86ac3736,DISK],
>
> DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
> 3. BP-1552885336-10.0.0.178-1446159880991:blk_1084952866_17799725
> len=134217728 repl=3
> [DatanodeInfoWithStorage[10.0.0.185:50010,DS-b5ff8ba0-275e-4846-b5a4-deda35aa0ad8,DISK],
>
> DatanodeInfoWithStorage[10.0.0.180:50010,DS-9cb6cade-9395-4f3a-ab7b-7fabd400b7f2,DISK],
>
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-e277dcf3-1bce-4efd-a668-cd6fb2e10588,DISK]]
> 4. BP-1552885336-10.0.0.178-1446159880991:blk_1084952872_17799891
> len=134217728 repl=4
> [DatanodeInfoWithStorage[10.0.0.184:50010,DS-e1d8f278-1a22-4294-ac7e-e12d554aef7f,DISK],
>
> DatanodeInfoWithStorage[10.0.0.186:50010,DS-5d9aeb2b-e677-41cd-844e-4b36b3c84092,DISK],
>
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK],
>
> DatanodeInfoWithStorage[10.0.0.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]]
> 5. BP-1552885336-10.0.0.78-1446159880991:blk_1084952880_17800120
> len=134217728 repl=3
> [DatanodeInfoWithStorage[10.0.0.181:50010,DS-79185b75-1938-4c91-a6d0-bb6687ca7e56,DISK],
>
> DatanodeInfoWithStorage[10.0.0.184:50010,DS-dcbd20aa-0334-49e0-b807-d2489f5923c6,DISK],
>
> DatanodeInfoWithStorage[10.0.0.183:50010,DS-f1d77328-f3af-483e-82e9-66ab0723a52c,DISK]]
> 6.
> BP-1552885336-10.0.0.178-1446159880991:blk_1084952887_17800316{UCState=COMMITTED,
> truncateBlock=null, primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-5f3eac72-eb55-4df7-bcaa-a6fa35c166a0:NORMAL:10.0.0.188:50010|RBW],
>
> ReplicaUC[[DISK]DS-a2a0d8f0-772e-419f-b4ff-10b4966c57ca:NORMAL:10.0.0.184:50010|RBW],
>
> ReplicaUC[[DISK]DS-52984aa0-598e-4fff-acfa-8904ca7b585c:NORMAL:10.0.0.185:50010|RBW]]}
> len=115289753 MISSING!
> Status: CORRUPT
> Total size: 920596121 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 7 (avg. block size 131513731 B)
> ********************************
> UNDER MIN REPL'D BLOCKS: 1 (14.285714 %)
> dfs.namenode.replication.min: 1
> CORRUPT FILES: 1
> MISSING BLOCKS: 1
> MISSING SIZE: 115289753 B
> ********************************
> Minimally replicated blocks: 6 (85.71429 %)
> Over-replicated blocks: 2 (28.571428 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 2.857143
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 10
> Number of racks: 1
> FSCK ended at Mon Oct 10 17:12:25 MSK 2016 in 0 milliseconds
> The filesystem under path
> '/hadoop/files/load_tarifer-zf-4_20160902165521521.csv' is CORRUPT
> File is UNDER_RECOVERY, NameNode think that last block in COMMITTED state,
> datanode think that block in RBW state. Recover not executed. The last block
> file and his meta exist's in 'rwb' directory on datanode:
> -rw-r--r-- 1 hdfs hdfs 115289753 Sep 2 16:56
> /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887
> -rw-r--r-- 1 hdfs hdfs 900711 Sep 2 16:56
> /hadoopdir/data/current/BP-1552885336-10.0.0.178-1446159880991/current/rbw/blk_1084952887_17800316.meta
> Lease recover tool said:
> hdfs debug recoverLease -path
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv
> Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
> recoverLease got exception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> Failed to RECOVER_LEASE
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv for
> DFSClient_NONMAPREDUCE_-1462314354_1 on 10.0.0.178 because the file is under
> construction but no leases found.
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2892)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2835)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:668)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:663)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
> at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy9.recoverLease(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recoverLease(ClientNamenodeProtocolTranslatorPB.java:603)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy10.recoverLease(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:1259)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:279)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:275)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:275)
> at
> org.apache.hadoop.hdfs.tools.DebugAdmin$RecoverLeaseCommand.run(DebugAdmin.java:256)
> at org.apache.hadoop.hdfs.tools.DebugAdmin.run(DebugAdmin.java:336)
> at org.apache.hadoop.hdfs.tools.DebugAdmin.main(DebugAdmin.java:359)
> Giving up on recoverLease for
> /hadoop/files/load_tarifer-zf-4_20160902165521521.csv after 1 try.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]