[
https://issues.apache.org/jira/browse/TEZ-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204003#comment-17204003
]
László Bodor edited comment on TEZ-4234 at 9/29/20, 5:58 PM:
-------------------------------------------------------------
[~jeagles]: could you please take a look at [^TEZ-4234.03.patch]? now, it
seems to be a regression of TEZ-4135 (cc: [~rajesh.balamohan])
1a). actual fix is 1 line, please find my explanation in comment there in code:
basically, it takes care of setting the codec's buffersize config back to get
rid of small, problematic buffersize in compressor/decompressor instances:
{code}
configurableCodec.getConf().setInt(bufferSizeProp, originalSize);
{code}
1b) Modified unit test (TestIFile#testInMemoryBufferSize) shows that the buffer
size config in codec is not changed while using the codec
2. getDecompressedInputStreamWithBufferSize moved to CodecUtils with the fix
3. CodecUtils.getCodec(conf) introduced to get rid of repeated logic
4. left some unit tests in TestIFile which describes the issue (they depend on
native library, and expected to fail), they don't prove the fix directly, just
shows the root cause (the scenario that [~gopalv] described), for which setting
the originalSize back could be a solution
was (Author: abstractdog):
[~jeagles]: could you please take a look at [^TEZ-4234.03.patch]? now, it
seems to be a regression of TEZ-4135 (cc: [~rajesh.balamohan])
1. actual fix is 1 line, please find my explanation in comment there in code:
basically, it takes care of setting the codec's buffersize config back to get
rid of small, problematic buffersize in compressor/decompressor instances:
{code}
configurableCodec.getConf().setInt(bufferSizeProp, originalSize);
{code}
2. getDecompressedInputStreamWithBufferSize moved to CodecUtils with the fix
3. CodecUtils.getCodec(conf) introduced to get rid of repeated logic
4. left some unit tests in TestIFile which describes the issue (they depend on
native library, and expected to fail), they don't prove the fix directly, just
shows the root cause (the scenario that [~gopalv] described), for which setting
the originalSize back could be a solution
> Compressor can cause IllegalArgumentException in Buffer.limit where limit
> exceeds capacity
> ------------------------------------------------------------------------------------------
>
> Key: TEZ-4234
> URL: https://issues.apache.org/jira/browse/TEZ-4234
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Turner Eagles
> Assignee: László Bodor
> Priority: Blocker
> Labels: 0.10_blocker
> Fix For: 0.10.0
>
> Attachments: TEZ-4234.01.patch, TEZ-4234.02.patch, TEZ-4234.03.patch,
> TEZ-4234.repro.patch, TEZ-4234.wip.patch
>
>
> {code}
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:275)
> at
> org.apache.hadoop.io.compress.lz4.Lz4Compressor.compress(Lz4Compressor.java:240)
> at
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> at
> org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:131)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:423)
> at
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:392)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:927)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:865)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.flush(DefaultSorter.java:729)
> at
> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:191)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:398)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:83)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2035)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)