taking values at runtime (i have it thru exceptions when the result is 0 and print out he values).
the \r\n problem was observed on the 0.13.0 release. To study the behavior, I instrument the hadoop source from the head of the tree. More specifically, attached are two sample stacks. (i have readbuffer throw when it gets 0 bytes, and have inputchecker catches the exception and rethrow both. This way, I catch the values from both caller and callee. on a separate note, if (len>=bytesPerSum) the assumption exists, would it be ok to throw exceptions when violated? most of time (e.g., in crawl/indexing), people won't notice some part of input data is getting throw away. It would be a lot easier to debug as code changes (and assumption get violated), and the cost in this case is probably not too bad as good part of the cost is probably in networks and going to disk. bwolen ------------------------------------- java.lang.RuntimeException: end of read() in=org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker len=127 pos=45223932 res=-999999 at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:50) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.hadoop.fs.FSDataInputStream$Buffer.read(FSDataInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:132) at org.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:124) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:108) at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:168) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:44) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1720) Caused by: java.lang.RuntimeException: end of read() datas=org.apache.hadoop.dfs.DFSClient$DFSDataInputStream pos=45223932 len=-381 bytesPerSum=512 eof=false read=0 at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:200) at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:175) at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:47) ... 11 more ------------------------- java.lang.RuntimeException: end of read() in=org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker len=127 pos=45223932 res=-999999 at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:50) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.hadoop.fs.FSDataInputStream$Buffer.read(FSDataInputStream.java:116) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:132) at org.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:124) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:108) at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:168) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:44) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1720) Caused by: java.lang.RuntimeException: end of read() datas=org.apache.hadoop.dfs.DFSClient$DFSDataInputStream pos=45223932 len=-381 bytesPerSum=512 eof=false read=0 at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:200) at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:175) at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:47) ... 11 more