[ 
https://issues.apache.org/jira/browse/HDFS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923075#action_12923075
 ] 

Konstantin Boudnik commented on HDFS-1459:
------------------------------------------

Thanks for opening new JIRA, Hajo. One thing: in the future, please try to 
limit 'Description' field to a self-explanatory short diagnosis of a problem 
and post any error messages, code snippets, etc. as 'Comment' messages.

> NullPointerException in DataInputStream.readInt
> -----------------------------------------------
>
>                 Key: HDFS-1459
>                 URL: https://issues.apache.org/jira/browse/HDFS-1459
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>         Environment: Debian 64 bit
> Cloudera Hadoop
>            Reporter: Hajo Nils Krabbenhöft
>
> First, here's my source code accessing the HDFS:
> final FSDataInputStream indexFile = getFile(bucketPathStr, 
> Integer.toString(hashTableId) + ".index");
> indexFile.seek(bucketId * 4);
> int bucketStart = ByteSwapper.swap(indexFile.readInt());
> int bucketEnd = ByteSwapper.swap(indexFile.readInt());
> final FSDataInputStream dataFile = getFile(bucketPathStr, 
> Integer.toString(hashTableId) + ".data");
> dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);
> for (int hash = bucketStart; hash < bucketEnd; hash++) {
>       int RimageIdA = ByteSwapper.swap(dataFile.readInt());
>       int RimageIdB = ByteSwapper.swap(dataFile.readInt());
>       ....... read hash of length Hasher.getConfigHashLength() and work with 
> it ....
> }
> As you can see, i am reading the range to be read from an X.index file and 
> then read these rows from X.data. The index file is always exactly 6.710.888 
> bytes in length.
> As for the data file, everything works fine with 50 different 1.35 GB (22 
> blocks) data files and it fails every time i tried with 50 different 2.42 GB 
> (39 blocks) data files. So the cause of the bug is clearly dependent on the 
> file size.
> I checked for ulimit and for the number of network connections and they are 
> both not maxed out when the error occurs. The stack trace i get is:
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
>       at java.io.DataInputStream.readInt(DataInputStream.java:370)
> ...
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> which leads me to believe that DFSClient.blockSeekTo returns with a non-null 
> chosenNode but with blockReader = null.
> Since the exact same jar works flawlessly with small data files and fails 
> reliably with big data files, i'm wondering how this could possibly dependent 
> on the file's size or block count (DFSClient.java line 1628+):
> s = socketFactory.createSocket();
> NetUtils.connect(s, targetAddr, socketTimeout);
> s.setSoTimeout(socketTimeout);
> Block blk = targetBlock.getBlock();
> blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), 
>     blk.getGenerationStamp(),
>     offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
>     buffersize, verifyChecksum, clientName);
> return chosenNode;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to