Sorry to re-send this. Referred to the comments in *org.apache.hadoop.mapred.LineRecordReader*. hadoop always read one extra line in next() method, which will lead this NPE.
So I think it's a defect in HCatalog, and filed it as https://issues.apache.org/jira/browse/HCATALOG-626 ---------- Forwarded message ---------- From: Bing Li <[email protected]> Date: 2013/3/5 Subject: TestReaderWriter failed with HADOOP-7823 (splitting support for bzip2) To: [email protected] Hi, All With hcatalog 0.4.0, TestReaderWriter failed with hadoop 1.1.1 due to HADOOP-7823 (https://issues.apache.org/jira/browse/HADOOP-7823). It will throw NPE: <testcase classname="org.apache.hcatalog.data.TestReaderWriter" name="test" time="6.932"> <error type="java.lang.NullPointerException">java.lang.NullPointerException at org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48) at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:219) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) at java.io.DataInputStream.read(DataInputStream.java:94) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43) at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:188) at org.apache.hcatalog.data.transfer.impl.HCatInputFormatReader$HCatRecordItr.hasNext(HCatInputFormatReader.java:107) at org.apache.hcatalog.data.TestReaderWriter.runsInSlave(TestReaderWriter.java:139) at org.apache.hcatalog.data.TestReaderWriter.test(TestReaderWriter.java:104) </error> </testcase> An important class is *org.apache.hadoop.mapred.LineRecordReader*. The *next() method* in *LineRecordReader* - hadoop 1.1.1 while (getFilePosition() *<= end*) {...} - hadoop 1.0.3 (without HADOOP-7823) while ( pos *< end*) {...} The value of *"end"* is from the length of an object of *FileSplit*, whose value is 1284 in this case. which is the length of a split in HDFS, * org.apache.hadoop.fs.FileStatus.getLen()* *Then the <key, value> pair changed as well:* - *without* NPE <0, > <0, Row #: 1 1> <11, Row #: 2 2> ... <1256, Row #: 99 99> <1269, Row #: 100 100> <1269, Row #: 100 100> - with NPE (with HADOOP-7823) <0, > <0, Row #: 1 1> <11, Row #: 2 2> ... <1256, Row #: 99 99> <1269, Row #: 100 100> <1284, > *Is it a bug in HCatalog?* Thanks, - Bing
