[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Ravi Prakash (JIRA) Wed, 12 Nov 2014 18:51:12 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209022#comment-14209022
 ]


Ravi Prakash commented on HDFS-7342:
------------------------------------

Some details I've been able to gather from the logs on a cluster running Hadoop 
2.2.0:
The client logs. 
{noformat}
2014-10-27 19:46:54,952 INFO [Thread-60] 
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS 
hdfs://<NAMENODE>:8020/<FILENAME>
..... nothing related to this file.......
2014-10-28 01:18:26,018 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not 
complete <FILENAME> retrying...
2014-10-28 01:18:26,419 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not 
complete <FILENAME> retrying...
...goes on for 10 mins.....
2014-10-28 01:28:24,481 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not 
complete <FILENAME> retrying...
2014-10-28 01:28:24,883 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not 
complete <FILENAME> retrying...
{noformat}

The Namenode Logs grepping for <FILENAME>
{noformat}
2014-10-27 19:46:58,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_A{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE1>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE8>:50010|RBW]]}
2014-10-27 20:13:26,607 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_B{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE2>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE9>:50010|RBW]]}
2014-10-27 20:47:52,422 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_C{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE3>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE10>:50010|RBW]]}
2014-10-27 21:23:13,844 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_D{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE4>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE11>:50010|RBW]]}
2014-10-27 22:02:33,405 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_E{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE5>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE12>:50010|RBW]]}
2014-10-27 22:42:49,227 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_F{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE6>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE15>:50010|RBW]]}
2014-10-27 23:25:58,555 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_G{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE7>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE12>:50010|RBW]]}
2014-10-28 00:07:36,093 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_H{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE2>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE13>:50010|RBW]]}
2014-10-28 01:13:50,298 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: <FILENAME>. <BlockPoolID> 
blk_A_I{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[<CLIENT_IP>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE1>:50010|RBW], 
ReplicaUnderConstruction[<SLAVE14>:50010|RBW]]}
2014-10-28 01:18:20,868 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
2014-10-28 01:18:21,272 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
....This keeps going interspersed with other logs until ....
2014-10-28 01:28:24,483 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
2014-10-28 01:28:25,615 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: <FILENAME> is closed by DFSClient_attempt_X_Y_r_T_U_V_W
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
......BOOM.... NN IS IN INFINITE LOOP...... Only the following two messages 
keep getting repeated:
2014-10-28 02:28:17,568 INFO 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit
2014-10-28 02:28:17,569 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src=<FILENAME>
{noformat}

> Lease Recovery doesn't happen some times
> ----------------------------------------
>
>                 Key: HDFS-7342
>                 URL: https://issues.apache.org/jira/browse/HDFS-7342
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Ravi Prakash
>
> In some cases, LeaseManager tries to recover a lease, but is not able to. 
> HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Reply via email to