Re: failed to create file ... because current leaseholder is trying to recreate file

stack Thu, 29 Nov 2007 11:25:39 -0800

The IOE in the below has a matching complaint in the data node. Hereare the matching client and datanode exceptions:


CLIENT:

2007-11-28 21:38:39,391 DEBUG hbase.HStore - started compaction of 3 files in 
/hbase/compaction.dir/hregion_-1314266377/cookie
2007-11-28 21:38:39,398 DEBUG fs.DFSClient - Failed to connect to 
/XX.XX.XX.17:50010:java.io.IOException: Got error in response to OP_READ_BLOCK
        at 
org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1027)
        at java.io.FilterInputStream.read(FilterInputStream.java:66)
        at java.io.DataInputStream.readByte(DataInputStream.java:248)
        at org.apache.hadoop.hbase.HStoreFile.loadInfo(HStoreFile.java:590)
        at org.apache.hadoop.hbase.HStore.compact(HStore.java:1007)
        at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
        at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
        at 
org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)



DATANODE:

2007-11-28 21:38:39,397 WARN  dfs.DataNode - XX.XX.XX.17:50010:Got exception 
while serving blk_5964910456479636527 to /XX.XX.XX.17:
java.io.IOException: Block blk_5964910456479636527 is not valid.
        at org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:549)
        at org.apache.hadoop.dfs.FSDataset.getMetaFile(FSDataset.java:466)
        at org.apache.hadoop.dfs.FSDataset.metaFileExists(FSDataset.java:470)
        at org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1281)
        at 
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:897)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
        at java.lang.Thread.run(Thread.java:619)

2007-11-28 21:38:39,397 ERROR dfs.DataNode - XX.XX.XX.17:50010:DataXceiver: 
java.io.IOException: Block blk_5964910456479636527 is not valid.
        at org.apache.hadoop.dfs.FSDataset.getBlockFile(FSDataset.java:549)
        at org.apache.hadoop.dfs.FSDataset.getMetaFile(FSDataset.java:466)
        at org.apache.hadoop.dfs.FSDataset.metaFileExists(FSDataset.java:470)
        at org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1281)
        at 
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:897)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:849)
        at java.lang.Thread.run(Thread.java:619)

How should I recover from this (and from theAlreadyBeingCreatedException with no matching namenode complaint in thebelow)?


Thanks,
St.Ack



stack wrote:

Reading down through the dfs code, at least theAlreadyBeingCreatedException should be getting retried. Why am Iseeing it then in my code (And what to do about the below IOE that hasno matching complaint in the namenode)?
Thanks,
St.Ack

stack wrote:
Here's more:
2007-11-27 05:00:26,331 DEBUG fs.DFSClient - Failed to connect to/XX.XX.XX.30:50010:java.io.IOException: Got error in response toOP_READ_BLOCKatorg.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)atorg.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)atorg.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
    at java.io.DataInputStream.readFully(DataInputStream.java:178)
    at java.io.DataInputStream.readFully(DataInputStream.java:152)
atorg.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)atorg.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1360)atorg.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)atorg.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
    at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
    at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
atorg.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:820)
    at org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:932)
atorg.apache.hadoop.hbase.HStore.compactHStoreFiles(HStore.java:1084)
    at org.apache.hadoop.hbase.HStore.compact(HStore.java:1017)
    at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
    at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
atorg.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)
Am I supposed to retry?
Will that even make a difference? Every ten minutes or so, clientwill retry getting a reader on either same file or another and getssame error.
Thanks for any insight,
St.Ack


stack wrote:
Anyone have insight on the following message from a near-TRUNKnamenode log?
2007-11-26 01:16:23,282 WARN dfs.StateChange - DIR*NameSystem.startFile: failed to create file/hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 onclient 38.99.77.80 because current leaseholder is trying to recreatefile.
It starts for no apparent reason. There is no exception or warningpreceding it in the log, and creation of files in dfs by pertinentcode has been working fine for a good while before this. After itstarts, it repeats every minute until the client-side goes away.While the client is up, it retries creating the file every 5 minutesor so.
From the client-side, the exception looks like this:
34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processingpending operations: ProcessServerShutdown of XX.XX.XX.XX:6002034975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to createfile /hbase/hregion_-1194436719/oldlogfile.log forDFSClient_610028837 on client XX.XX.XX.XX because curren tleaseholder is trying to recreate file.34976 atorg.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)34977 atorg.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
34980 atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
34981 at java.lang.reflect.Method.invoke(Method.java:597)
34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
34984
34985 atsun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)34986 atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)34987 atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)34988 atjava.lang.reflect.Constructor.newInstance(Constructor.java:513)34989 atorg.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)

The code is not exotic:
SequenceFile.Writer w = SequenceFile.createWriter(fs,conf, logfile, HLogKey.class, HLogEdit.class);
(See HLog from hbase around #180 if you want to see more context).
I see the comments in the incomplete HADOOP-2050. There is someimplication that an attempt at a create should be wrapped in atry/catch that retries any AlreadyBeingCreatedException. Should Ibe doing this (It looks like its done in DFSClient)?
Thanks for any insight,
St.Ack

P.S. HADOOP-2283 is issue I've opened to cover this particular topic.

Re: failed to create file ... because current leaseholder is trying to recreate file

Reply via email to