[ https://issues.apache.org/jira/browse/HADOOP-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HADOOP-8423: -------------------------------- Attachment: hadoop-8423.txt Attached patch should fix the issue. I only tested with Snappy, not LZO, so please let me know if LZO doesn't work. The issue was that BlockDecompressorStream wasn't resetting its own state when resetState() was called. So, when reseeking in the SequenceFile.Reader, it would get "out of sync" - and be at the beginning of a block but think it was in the middle of a block. So, the codec got invalid data fed to it. > MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO > block-compressed data > ---------------------------------------------------------------------------------------------- > > Key: HADOOP-8423 > URL: https://issues.apache.org/jira/browse/HADOOP-8423 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 0.20.2 > Environment: Linux 2.6.32.23-0.3-default #1 SMP 2010-10-07 14:57:45 > +0200 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Jason B > Assignee: Todd Lipcon > Attachments: MapFileCodecTest.java, hadoop-8423.txt > > > I am using Cloudera distribution cdh3u1. > When trying to check native codecs for better decompression > performance such as Snappy or LZO, I ran into issues with random > access using MapFile.Reader.get(key, value) method. > First call of MapFile.Reader.get() works but a second call fails. > Also I am getting different exceptions depending on number of entries > in a map file. > With LzoCodec and 10 record file, jvm gets aborted. > At the same time the DefaultCodec works fine for all cases, as well as > record compression for the native codecs. > I created a simple test program (attached) that creates map files > locally with sizes of 10 and 100 records for three codecs: Default, > Snappy, and LZO. > (The test requires corresponding native library available) > The summary of problems are given below: > Map Size: 100 > Compression: RECORD > ================== > DefaultCodec: OK > SnappyCodec: OK > LzoCodec: OK > Map Size: 10 > Compression: RECORD > ================== > DefaultCodec: OK > SnappyCodec: OK > LzoCodec: OK > Map Size: 100 > Compression: BLOCK > ================ > DefaultCodec: OK > SnappyCodec: java.io.EOFException at > org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114) > LzoCodec: java.io.EOFException at > org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114) > Map Size: 10 > Compression: BLOCK > ================== > DefaultCodec: OK > SnappyCodec: java.lang.NoClassDefFoundError: Ljava/lang/InternalError > at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native > Method) > LzoCodec: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00002b068ffcbc00, pid=6385, tid=47304763508496 > # > # JRE version: 6.0_21-b07 > # Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b17 mixed mode linux-amd64 > ) > # Problematic frame: > # C [liblzo2.so.2+0x13c00] lzo1x_decompress+0x1a0 > # -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira