[
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209022#comment-14209022
]
Ravi Prakash commented on HDFS-7342:
------------------------------------
Some details I've been able to gather from the logs on a cluster running Hadoop
2.2.0:
The client logs.
{noformat}
2014-10-27 19:46:54,952 INFO [Thread-60]
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS
hdfs://<NAMENODE>:8020/<FILENAME>
..... nothing related to this file.......
2014-10-28 01:18:26,018 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not
complete <FILENAME> retrying...
2014-10-28 01:18:26,419 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not
complete <FILENAME> retrying...
...goes on for 10 mins.....
2014-10-28 01:28:24,481 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not
complete <FILENAME> retrying...
2014-10-28 01:28:24,883 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not
complete <FILENAME> retrying...
{noformat}
The Namenode Logs grepping for <FILENAME>
{noformat}
2014-10-27 19:46:58,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_A{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE1>:50010|RBW],
ReplicaUnderConstruction[<SLAVE8>:50010|RBW]]}
2014-10-27 20:13:26,607 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_B{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE2>:50010|RBW],
ReplicaUnderConstruction[<SLAVE9>:50010|RBW]]}
2014-10-27 20:47:52,422 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_C{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE3>:50010|RBW],
ReplicaUnderConstruction[<SLAVE10>:50010|RBW]]}
2014-10-27 21:23:13,844 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_D{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE4>:50010|RBW],
ReplicaUnderConstruction[<SLAVE11>:50010|RBW]]}
2014-10-27 22:02:33,405 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_E{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE5>:50010|RBW],
ReplicaUnderConstruction[<SLAVE12>:50010|RBW]]}
2014-10-27 22:42:49,227 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_F{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE6>:50010|RBW],
ReplicaUnderConstruction[<SLAVE15>:50010|RBW]]}
2014-10-27 23:25:58,555 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_G{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE7>:50010|RBW],
ReplicaUnderConstruction[<SLAVE12>:50010|RBW]]}
2014-10-28 00:07:36,093 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_H{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE2>:50010|RBW],
ReplicaUnderConstruction[<SLAVE13>:50010|RBW]]}
2014-10-28 01:13:50,298 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
allocateBlock: <FILENAME>. <BlockPoolID>
blk_A_I{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW],
ReplicaUnderConstruction[<SLAVE1>:50010|RBW],
ReplicaUnderConstruction[<SLAVE14>:50010|RBW]]}
2014-10-28 01:18:20,868 INFO org.apache.hadoop.hdfs.StateChange: DIR*
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
2014-10-28 01:18:21,272 INFO org.apache.hadoop.hdfs.StateChange: DIR*
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
....This keeps going interspersed with other logs until ....
2014-10-28 01:28:24,483 INFO org.apache.hadoop.hdfs.StateChange: DIR*
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
2014-10-28 01:28:25,615 INFO org.apache.hadoop.hdfs.StateChange: DIR*
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
......BOOM.... NN IS IN INFINITE LOOP...... Only the following two messages
keep getting repeated:
2014-10-28 02:28:17,568 INFO
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder:
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder:
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder:
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder:
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
{noformat}
> Lease Recovery doesn't happen some times
> ----------------------------------------
>
> Key: HDFS-7342
> URL: https://issues.apache.org/jira/browse/HDFS-7342
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.0.0-alpha
> Reporter: Ravi Prakash
>
> In some cases, LeaseManager tries to recover a lease, but is not able to.
> HDFS-4882 describes a possibility of that. We should fix this
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)