Here's more:
2007-11-27 05:00:26,331 DEBUG fs.DFSClient - Failed to connect to
/38.99.76.30:50010:java.io.IOException: Got error in response to OP_READ_BLOCK
at
org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1360)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
at
org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:820)
at org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:932)
at org.apache.hadoop.hbase.HStore.compactHStoreFiles(HStore.java:1084)
at org.apache.hadoop.hbase.HStore.compact(HStore.java:1017)
at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
at
org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)
Am I supposed to retry?
Will that even make a difference? Every ten minutes or so, client will
retry getting a reader on either same file or another and gets same error.
Thanks for any insight,
St.Ack
stack wrote:
Anyone have insight on the following message from a near-TRUNK
namenode log?
2007-11-26 01:16:23,282 WARN dfs.StateChange - DIR*
NameSystem.startFile: failed to create file
/hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on
client 38.99.77.80 because current leaseholder is trying to recreate
file.
It starts for no apparent reason. There is no exception or warning
preceding it in the log, and creation of files in dfs by pertinent
code has been working fine for a good while before this. After it
starts, it repeats every minute until the client-side goes away.
While the client is up, it retries creating the file every 5 minutes
or so.
From the client-side, the exception looks like this:
34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
operations: ProcessServerShutdown of XX.XX.XX.XX:60020
34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837
on client XX.XX.XX.XX because curren t leaseholder is trying to
recreate file.
34976 at
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
34977 at
org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
34980 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
34981 at java.lang.reflect.Method.invoke(Method.java:597)
34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
34984
34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
34986 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
34987 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
34989 at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
The code is not exotic:
SequenceFile.Writer w = SequenceFile.createWriter(fs,
conf, logfile, HLogKey.class, HLogEdit.class);
(See HLog from hbase around #180 if you want to see more context).
I see the comments in the incomplete HADOOP-2050. There is some
implication that an attempt at a create should be wrapped in a
try/catch that retries any AlreadyBeingCreatedException. Should I be
doing this (It looks like its done in DFSClient)?
Thanks for any insight,
St.Ack
P.S. HADOOP-2283 is issue I've opened to cover this particular topic.