[ 
https://issues.apache.org/jira/browse/HDFS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hajo Nils Krabbenhöft updated HDFS-1459:
----------------------------------------

    Description: 
First, here's my source code accessing the HDFS:


final FSDataInputStream indexFile = getFile(bucketPathStr, 
Integer.toString(hashTableId) + ".index");
indexFile.seek(bucketId * 4);
int bucketStart = ByteSwapper.swap(indexFile.readInt());
int bucketEnd = ByteSwapper.swap(indexFile.readInt());

final FSDataInputStream dataFile = getFile(bucketPathStr, 
Integer.toString(hashTableId) + ".data");
dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);

for (int hash = bucketStart; hash < bucketEnd; hash++) {
        int RimageIdA = ByteSwapper.swap(dataFile.readInt());
        int RimageIdB = ByteSwapper.swap(dataFile.readInt());
        ....... read hash of length Hasher.getConfigHashLength() and work with 
it ....
}


As you can see, i am reading the range to be read from an X.index file and then 
read these rows from X.data. The index file is always exactly 6.710.888 bytes 
in length.

As for the data file, everything works fine with 50 different 1.35 GB (22 
blocks) data files and it fails every time i tried with 50 different 2.42 GB 
(39 blocks) data files. So the cause of the bug is clearly dependent on the 
file size.

I checked for ulimit and for the number of network connections and they are 
both not maxed out when the error occurs. The stack trace i get is:

java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
...
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

which leads me to believe that DFSClient.blockSeekTo returns with a non-null 
chosenNode but with blockReader = null.

Since the exact same jar works flawlessly with small data files and fails 
reliably with big data files, i'm wondering how this could possibly dependent 
on the file's size or block count (DFSClient.java line 1628+):

s = socketFactory.createSocket();
NetUtils.connect(s, targetAddr, socketTimeout);
s.setSoTimeout(socketTimeout);
Block blk = targetBlock.getBlock();

blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), 
    blk.getGenerationStamp(),
    offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
    buffersize, verifyChecksum, clientName);
return chosenNode;


  was:

First, here's my source code accessing the HDFS:


final FSDataInputStream indexFile = getFile(bucketPathStr, 
Integer.toString(hashTableId) + ".index");
indexFile.seek(bucketId * 4);
int bucketStart = ByteSwapper.swap(indexFile.readInt());
int bucketEnd = ByteSwapper.swap(indexFile.readInt());

final FSDataInputStream dataFile = getFile(bucketPathStr, 
Integer.toString(hashTableId) + ".data");
dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);

for (int hash = bucketStart; hash < bucketEnd; hash++) {
        int RimageIdA = ByteSwapper.swap(dataFile.readInt());
        int RimageIdB = ByteSwapper.swap(dataFile.readInt());
        ....... read hash of length Hasher.getConfigHashLength() and work with 
it ....
}


As you can see, i am reading the range to be read from an X.index file and then 
read these rows from X.data. The index file is always exactly 6.710.888 bytes 
in length.

As for the data file, everything works fine with 50 different 1.35 GB (22 
blocks) data files and it fails every time i tried with 50 different 2.42 GB 
(39 blocks) data files. So the cause of the bug is clearly dependent on the 
file size.

I checked for ulimit and for the number of network connections and they are 
both not maxed out when the error occurs. The stack trace i get is:

java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
...
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

which leads me to believe that DFSClient.blockSeekTo returns with a non-null 
chosenNode but with blockReader = null.

Since the exact same jar works flawlessly with small data files and fails 
reliably with big data files, i'm wondering how this could possibly dependent 
on the file's size or block count (DFSClient.java line 1628+):

s = socketFactory.createSocket();
NetUtils.connect(s, targetAddr, socketTimeout);
s.setSoTimeout(socketTimeout);
Block blk = targetBlock.getBlock();

blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), 
    blk.getGenerationStamp(),
    offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
    buffersize, verifyChecksum, clientName);
return chosenNode;


        Summary: NullPointerException in DataInputStream.readInt caused by 
reaching xceiverCount  (was: NullPointerException in DataInputStream.readInt)

The NPE on client side is caused by this on datanode side:


2010-10-20 15:31:17,177 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.17.5.3:50010, 
storageID=DS-266784496-78.46.65.54-50010-1287004808819, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 
256
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
        at java.lang.Thread.run(Thread.java:619)


> NullPointerException in DataInputStream.readInt caused by reaching 
> xceiverCount
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-1459
>                 URL: https://issues.apache.org/jira/browse/HDFS-1459
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>         Environment: Debian 64 bit
> Cloudera Hadoop
>            Reporter: Hajo Nils Krabbenhöft
>
> First, here's my source code accessing the HDFS:
> final FSDataInputStream indexFile = getFile(bucketPathStr, 
> Integer.toString(hashTableId) + ".index");
> indexFile.seek(bucketId * 4);
> int bucketStart = ByteSwapper.swap(indexFile.readInt());
> int bucketEnd = ByteSwapper.swap(indexFile.readInt());
> final FSDataInputStream dataFile = getFile(bucketPathStr, 
> Integer.toString(hashTableId) + ".data");
> dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);
> for (int hash = bucketStart; hash < bucketEnd; hash++) {
>       int RimageIdA = ByteSwapper.swap(dataFile.readInt());
>       int RimageIdB = ByteSwapper.swap(dataFile.readInt());
>       ....... read hash of length Hasher.getConfigHashLength() and work with 
> it ....
> }
> As you can see, i am reading the range to be read from an X.index file and 
> then read these rows from X.data. The index file is always exactly 6.710.888 
> bytes in length.
> As for the data file, everything works fine with 50 different 1.35 GB (22 
> blocks) data files and it fails every time i tried with 50 different 2.42 GB 
> (39 blocks) data files. So the cause of the bug is clearly dependent on the 
> file size.
> I checked for ulimit and for the number of network connections and they are 
> both not maxed out when the error occurs. The stack trace i get is:
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
>       at java.io.DataInputStream.readInt(DataInputStream.java:370)
> ...
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> which leads me to believe that DFSClient.blockSeekTo returns with a non-null 
> chosenNode but with blockReader = null.
> Since the exact same jar works flawlessly with small data files and fails 
> reliably with big data files, i'm wondering how this could possibly dependent 
> on the file's size or block count (DFSClient.java line 1628+):
> s = socketFactory.createSocket();
> NetUtils.connect(s, targetAddr, socketTimeout);
> s.setSoTimeout(socketTimeout);
> Block blk = targetBlock.getBlock();
> blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), 
>     blk.getGenerationStamp(),
>     offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
>     buffersize, verifyChecksum, clientName);
> return chosenNode;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to