TestReaderWriter failed with HADOOP-7823 (splitting support for bzip2)

Bing Li Tue, 05 Mar 2013 00:46:34 -0800

Sorry to re-send this.

Referred to the comments in *org.apache.hadoop.mapred.LineRecordReader*.
hadoop always read one extra line in next() method, which will lead this
NPE.


So I think it's a defect in HCatalog, and filed it as

https://issues.apache.org/jira/browse/HCATALOG-626


---------- Forwarded message ----------
From: Bing Li <[email protected]>
Date: 2013/3/5
Subject: TestReaderWriter failed with HADOOP-7823 (splitting support for
bzip2)
To: [email protected]


Hi, All
With hcatalog 0.4.0, TestReaderWriter failed with hadoop 1.1.1 due to
HADOOP-7823 (https://issues.apache.org/jira/browse/HADOOP-7823).

It will throw NPE:
<testcase classname="org.apache.hcatalog.data.TestReaderWriter" name="test"
time="6.932">
    <error
type="java.lang.NullPointerException">java.lang.NullPointerException
        at
org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
        at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
        at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:219)
        at
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
        at
org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at java.io.DataInputStream.read(DataInputStream.java:94)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
        at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176)
        at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43)
        at
org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:188)
        at
org.apache.hcatalog.data.transfer.impl.HCatInputFormatReader$HCatRecordItr.hasNext(HCatInputFormatReader.java:107)
        at
org.apache.hcatalog.data.TestReaderWriter.runsInSlave(TestReaderWriter.java:139)
        at
org.apache.hcatalog.data.TestReaderWriter.test(TestReaderWriter.java:104)
</error>
  </testcase>

An important class is *org.apache.hadoop.mapred.LineRecordReader*.
The *next() method* in *LineRecordReader*

- hadoop 1.1.1
   while (getFilePosition() *<= end*) {...}

- hadoop 1.0.3 (without HADOOP-7823)
  while ( pos *< end*) {...}

The value of *"end"* is from the length of an object of *FileSplit*, whose
value is 1284 in this case.
  which is the length of a split in HDFS, *
org.apache.hadoop.fs.FileStatus.getLen()*

*Then the <key, value> pair changed as well:*
- *without* NPE
<0, >
<0, Row #: 1 1>
<11, Row #: 2 2>
...
<1256, Row #: 99 99>
<1269, Row #: 100 100>
<1269, Row #: 100 100>

- with NPE (with HADOOP-7823)
<0, >
<0, Row #: 1 1>
<11, Row #: 2 2>
...
<1256, Row #: 99 99>
<1269, Row #: 100 100>
<1284, >

*Is it a bug in HCatalog?*


Thanks,
- Bing

TestReaderWriter failed with HADOOP-7823 (splitting support for bzip2)

Reply via email to