Hi All,
I am seeing the following. A file is attempting to close but has to
replicate, before the replication gets finished the lease times out and
is closed which then causes the write to fail. Here is the log in sequence.
2006-06-12 01:58:07,999 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.pendingTransfer: ask andromeda01.visvo.com:50010 to replicate
blk_-80500146
83592461050 to datanode(s) andromeda07.visvo.com:50010
2006-06-12 02:02:16,915 INFO org.apache.hadoop.fs.FSNamesystem: Removing
lease [Lease. Holder: DFSClient_1073590514, heldlocks: 0,
pendingcreates: 0], leases remaining: 11
2006-06-12 02:02:20,175 INFO org.apache.hadoop.ipc.Server: Server
connection on port 9000 from 192.168.1.240: exiting
2006-06-12 02:02:27,293 WARN org.apache.hadoop.dfs.StateChange: DIR*
NameSystem.completeFile: failed to complete
/user/phoenix/crawl/newsegs/20060612002155/parse_data/part-00008/data
because dir.getFile()==null and null
2006-06-12 02:02:27,324 INFO org.apache.hadoop.ipc.Server: Server
handler 0 on 9000 call error: java.io.IOException: Could not complete
write to file
/user/phoenix/crawl/newsegs/20060612002155/parse_data/part-00008/data by
DFSClient_1073590514
java.io.IOException: Could not complete write to file
/user/phoenix/crawl/newsegs/20060612002155/parse_data/part-00008/data by
DFSClient_1073590514
at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:240)
at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:243)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:231)
2006-06-12 02:02:27,962 INFO org.apache.hadoop.ipc.Server: Server
connection on port 9000 from 192.168.1.237: exiting
2006-06-12 02:02:28,791 INFO org.apache.hadoop.ipc.Server: Server
connection on port 9000 from 192.168.1.243: exiting
I think that in DFSOutputStream in the DFSClient file on line 1073, the
following should be added.
namenode.renewLease( clientName.toString());
This will renew the lease while it is waiting on file completion (most
likely replication). The problem is that I don't know the core of
hadoop well enough yet to understand if this will cause other problems
so I wanted to get some feedback on this before I submit a patch.
Please let me know if this is a valid change or if it causes other problems.
Dennis