[
https://issues.apache.org/jira/browse/HADOOP-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616442#action_12616442
]
Peter Voss commented on HADOOP-3821:
------------------------------------
Right. The patch won't apply to branch-0.17.
If 0.18 release is close that should be fine as well. Although I still would
prefer a patch for 0.17, because we are going to ship a new release of our
product that uses hadoop 0.17 soon. I just would like to avoid any risks that
would come with a switch to 0.18 in that stage of the release process. Anyway,
if our QA doesn't want an 0.18 upgrade for this release I could also patch the
0.17 branch myself and we switch to 0.18 for our next release.
Interesting that 0.17.1 is the latest stable release. I don't see it at
http://apache.cs.utah.edu/hadoop/core/stable/
Thanks a lot for your really quick progress on this issue.
> SequenceFile's Reader.decompressorPool or Writer.decompressorPool gets into
> an inconsistent state when calling close() more than once
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-3821
> URL: https://issues.apache.org/jira/browse/HADOOP-3821
> Project: Hadoop Core
> Issue Type: Bug
> Components: io
> Affects Versions: 0.17.0, 0.17.1, 0.18.0, 0.19.0
> Reporter: Peter Voss
> Assignee: Arun C Murthy
> Fix For: 0.19.0
>
> Attachments: HADOOP-3821_0_20080724.patch
>
>
> SequenceFile.Reader uses a decompressorPool to reuse Decompressor instances.
> The Reader obtains such an instance from the pool on object creation and
> returns it back to the pool it when {{close()}} is called.
> SequenceFile.Reader implements the {{java.io.Closable}} interface and it's
> spec on the {{close()}} method says:
> {quote}
> Closes this stream and releases any system resources associated
> with it. If the stream is already closed then invoking this
> method has no effect.
> {quote}
> This spec is violated by the Reader implementation, because calling
> {{close()}} multiple times has really bad implications.
> When you call {{close()}} twice, one and the same Decompressor instances will
> be returned to the pool two times and the pool would now maintain duplicated
> references to the same Decompressor instances. When other Readers now request
> instances from the pool it might happen that two Readers get the same
> Decompressor instance.
> The correct behavior would be to just ignore a second call to {{close()}}.
> The exact same issue applies to the SequenceFile.Writer as well.
> We were having big trouble with this, because we were observing sporadic
> exceptions from merge operations. The strange thing was that executing the
> same merge again usually succeeded. But sometimes it took multiple attempts
> to complete a merge successfully. It was very hard to debug that the root
> cause was some duplicated Decompressor references in the decompressorPool.
> Exceptions that we observed in production looked like this (we were using
> hadoop 0.17.0):
> {noformat}
> java.io.IOException: unknown compression method
> at
> org.apache.hadoop.io.compress.zlib.BuiltInZlibInflater.decompress(BuiltInZlibInflater.java:47)
> at
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:80)
> at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:56)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
> at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1995)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:3002)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2760)
> at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2625)
> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2644)
> {noformat}
> or
> {noformat}
> java.io.IOException: zero length keys not allowed
> at
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.appendRaw(SequenceFile.java:1340)
> at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2626)
> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2644)
> {noformat}
> The following snippet reproduces the problem:
> {code:java}
> public void testCodecPool() throws IOException {
> Configuration conf = new Configuration();
> LocalFileSystem fs = new LocalFileSystem();
> fs.setConf(conf);
> fs.getRawFileSystem().setConf(conf);
> // create a sequence file
> Path path = new Path("target/seqFile");
> SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,
> path, Text.class, NullWritable.class, CompressionType.BLOCK);
> writer.append(new Text("key1"), NullWritable.get());
> writer.append(new Text("key2"), NullWritable.get());
> writer.close();
> // Create a reader which uses 4 BuiltInZLibInflater instances
> SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);
> // Returns the 4 BuiltInZLibInflater instances to the CodecPool
> reader.close();
> // The second close erroneously returns the same 4
> BuiltInZLibInflater instances to the CodecPool again
> reader.close();
> // The first reader gets 4 BuiltInZLibInflater instances from the
> CodecPool
> SequenceFile.Reader reader1 = new SequenceFile.Reader(fs, path, conf);
> // read first value from reader1
> Text text = new Text();
> reader1.next(text);
> assertEquals("key1", text.toString());
> // The second reader gets the same 4 BuiltInZLibInflater instances
> from the CodePool as reader1
> SequenceFile.Reader reader2 = new SequenceFile.Reader(fs, path, conf);
> // read first value from reader2
> reader2.next(text);
> assertEquals("key1", text.toString());
> // read second value from reader1
> reader1.next(text);
> assertEquals("key2", text.toString());
> // read second value from reader2 (this throws an exception)
> reader2.next(text);
> assertEquals("key2", text.toString());
>
> assertFalse(reader1.next(text));
> assertFalse(reader2.next(text));
> }
> {code}
> It fails with the exception:
> {noformat}
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:243)
> at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:324)
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:345)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1835)
> at CodecPoolTest.testCodecPool(CodecPoolTest.java:56)
> {noformat}
> But this is just a very simple test that shows the problem. Much more weired
> things can happen when running in a complex production environment. Esp.
> heavy concurrency makes the behavior much more exciting. ;-)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.