[ 
https://issues.apache.org/jira/browse/HADOOP-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481486
 ] 

Sameer Paranjpye commented on HADOOP-1093:
------------------------------------------

So Nigel and I were poking around the Namenode logs, trying to trace all the 
events in the lifetime of a block. We have a theory that seems to fit all the 
symptoms.

If you look through the life story of blk_3903667269962316732, the following 
events appear:

2007-03-16 01:21:05,248 DEBUG org.apache.hadoop.dfs.StateChange: *BLOCK* 
NameNode.reportWrittenBlock: blk_391490390916935261 is written to 1 locations
2007-03-16 01:21:05,248 DEBUG org.apache.hadoop.dfs.StateChange: *BLOCK* 
NameNode.blockReport: from 72.30.52.207:50010 2627 blocks
2007-03-16 01:21:05,250 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.blockReceived: blk_391490390916935261 is received from 
72.30.52.207:50010
2007-03-16 01:21:05,250 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 72.30.52.207:50010 is added to 
blk_391490390916935261
2007-03-16 01:21:05,254 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.processReport: from 72.30.52.207:50010 2627 blocks
2007-03-16 01:21:05,255 DEBUG org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.removeStoredBlock: blk_391490390916935261 from 72.30.52.207:50010

What seems to be happening is a race between a client reporting a block and a 
block report from a Datanode:

- the client reports a written block
- immediately after this the NameNode receives a block report from the DataNode 
which *doesn't* contain this block because the block report was generated while 
the block was in flight
- the NameNode trusts the block report and blithely removes the block from 
block map
- the next attempt by the client to allocate a block causes a 
NotReplicatedYetException

Other tells for this condition:
- The NRYE happens mostly in the Namenode benchmark, which generates lots and 
lots and lots of blocks with exactly 1 replica. With 'n > 1' replicas, 'n' 
block reports have to coincide with the client report in order for this to 
happen.

- The NRYE shows up 1 hour into the benchmark and 1 hour is of course the block 
report interval. We validated this by setting the block report interval to 5 
minutes and sure enough NRYEs showed up in 5 minutes.

We think the checksum patch caused this because it enforced a 1-byte blocksize 
on .crc files in NNbench. Earlier, setting the block size on a file had no 
effect on .crc files. This certainly needs fixing, but doesn't appear to be a 
blocker for 0.12.1.








> NNBench generates millions of NotReplicatedYetException in Namenode log
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1093
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1093
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Nigel Daley
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>
> Running NNBench on latest trunk (0.12.1 candidate) on a few hundred nodes 
> yielded 2.3 million of these exceptions in the NN log:
>    2007-03-08 09:23:03,053 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 0 on 8020 call error:
>    org.apache.hadoop.dfs.NotReplicatedYetException: Not replicated yet
>         at 
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:803)
>         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:309)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> I run NNBench to create files with block size set to 1 and replication set to 
> 1.  NNBench then writes 1 byte to the file.  Minimum replication for the 
> cluster is the default, ie 1.  If it encounters an exception while trying to 
> do either the create or write operations, it loops and tries again.  Multiply 
> this by 1000 files per node and a few hundred nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to