[ 
https://issues.apache.org/jira/browse/HDFS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924306#action_12924306
 ] 

Hajo Nils Krabbenhöft commented on HDFS-1459:
---------------------------------------------

I found this in my datanode logs:

2010-10-20 15:31:17,154 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.17.5.3:50010, 
storageID=DS-266784496-78.46.65.54-50010-1287004808819, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 
256
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
        at java.lang.Thread.run(Thread.java:619)

2010-10-20 15:31:19,115 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.17.5.3:50010, 
storageID=DS-266784496-78.46.65.54-50010-1287004808819, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_-8099607957427967059_1974 to 
/10.17.5.4:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/10.17.5.3:50010 remote=/10.17.5.4:51336]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
        at java.lang.Thread.run(Thread.java:619)

and so far using this configuration snippet seems to fix the problem:

<property>
  <name>dfs.datanode.handler.count</name>
  <value>40</value>
  <description>The number of server threads for the datanode.</description>
</property>

<property>
  <name>dfs.namenode.handler.count</name>
  <value>40</value>
  <description>The number of server threads for the namenode.</description>
</property>

<property>      
  <name>dfs.datanode.max.xcievers</name>        
  <value>2048</value>   
  <description>The maximum # of threads that can be connected to a data
ndoe simultaneously. Default value is 256.      
  </description>        
</property>


So the underlying problem seems to be that when max xcievers is reached that 
the client does not get notified and thus reports unusable error messages.

> NullPointerException in DataInputStream.readInt
> -----------------------------------------------
>
>                 Key: HDFS-1459
>                 URL: https://issues.apache.org/jira/browse/HDFS-1459
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>         Environment: Debian 64 bit
> Cloudera Hadoop
>            Reporter: Hajo Nils Krabbenhöft
>
> First, here's my source code accessing the HDFS:
> final FSDataInputStream indexFile = getFile(bucketPathStr, 
> Integer.toString(hashTableId) + ".index");
> indexFile.seek(bucketId * 4);
> int bucketStart = ByteSwapper.swap(indexFile.readInt());
> int bucketEnd = ByteSwapper.swap(indexFile.readInt());
> final FSDataInputStream dataFile = getFile(bucketPathStr, 
> Integer.toString(hashTableId) + ".data");
> dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);
> for (int hash = bucketStart; hash < bucketEnd; hash++) {
>       int RimageIdA = ByteSwapper.swap(dataFile.readInt());
>       int RimageIdB = ByteSwapper.swap(dataFile.readInt());
>       ....... read hash of length Hasher.getConfigHashLength() and work with 
> it ....
> }
> As you can see, i am reading the range to be read from an X.index file and 
> then read these rows from X.data. The index file is always exactly 6.710.888 
> bytes in length.
> As for the data file, everything works fine with 50 different 1.35 GB (22 
> blocks) data files and it fails every time i tried with 50 different 2.42 GB 
> (39 blocks) data files. So the cause of the bug is clearly dependent on the 
> file size.
> I checked for ulimit and for the number of network connections and they are 
> both not maxed out when the error occurs. The stack trace i get is:
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
>       at java.io.DataInputStream.readInt(DataInputStream.java:370)
> ...
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> which leads me to believe that DFSClient.blockSeekTo returns with a non-null 
> chosenNode but with blockReader = null.
> Since the exact same jar works flawlessly with small data files and fails 
> reliably with big data files, i'm wondering how this could possibly dependent 
> on the file's size or block count (DFSClient.java line 1628+):
> s = socketFactory.createSocket();
> NetUtils.connect(s, targetAddr, socketTimeout);
> s.setSoTimeout(socketTimeout);
> Block blk = targetBlock.getBlock();
> blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), 
>     blk.getGenerationStamp(),
>     offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
>     buffersize, verifyChecksum, clientName);
> return chosenNode;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to