[
https://issues.apache.org/jira/browse/TEZ-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937112#comment-15937112
]
Zhiyuan Yang edited comment on TEZ-3616 at 3/23/17 3:29 AM:
------------------------------------------------------------
Thanks [~ferhui] for working on this! As you said, the issue is caused by early
finished merge. TEZ-2859 tried to fix the same problem, but unfortunately the
artificial delay wasn't introduced at the right place.
{code}
tmpDir = new Path(inputContext.getUniqueIdentifier());
try {
....
writer.close();
additionalBytesWritten.increment(writer.getCompressedLength());
} catch (IOException e) {
localFS.delete(outputPath, true);
throw e;
}
final long outputLen = localFS.getFileStatus(outputPath).getLen();
closeOnDiskFile(new FileChunk(outputPath, 0, outputLen));
{code}
The interrupt is supposed to happen when onDiskMerger thread is inside the
try-catch block. Adding more data for merger can be a workaround, but a more
promising fix is to prolong the try-catch. Maybe we can introduce the desired
delay by using mock TezCounter for additionalBytesWritten.
was (Author: aplusplus):
Thanks [~ferhui] for working on this! As you said, the issue is caused by early
finished merge. TEZ-3859 tried to fix the same problem, but unfortunately the
artificial delay wasn't introduced at the right place.
{code}
tmpDir = new Path(inputContext.getUniqueIdentifier());
try {
....
writer.close();
additionalBytesWritten.increment(writer.getCompressedLength());
} catch (IOException e) {
localFS.delete(outputPath, true);
throw e;
}
final long outputLen = localFS.getFileStatus(outputPath).getLen();
closeOnDiskFile(new FileChunk(outputPath, 0, outputLen));
{code}
The interrupt is supposed to happen when onDiskMerger thread is inside the
try-catch block. Adding more data for merger can be a workaround, but a more
promising fix is to prolong the try-catch. Maybe we can introduce the desired
delay by using mock TezCounter for additionalBytesWritten.
> TestMergeManager#testLocalDiskMergeMultipleTasks fails intermittently
> ----------------------------------------------------------------------
>
> Key: TEZ-3616
> URL: https://issues.apache.org/jira/browse/TEZ-3616
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.0
> Environment: Ubuntu 14.04
> Reporter: Sonia Garudi
> Assignee: Fei Hui
> Labels: ppc64le, x86
> Attachments: TEZ-3616.001.patch
>
>
> In tez-runtime-library project, the
> TestMergeManager#testLocalDiskMergeMultipleTasks test fails intermittently
> with the following error:
> testLocalDiskMergeMultipleTasks(org.apache.tez.runtime.library.common.shuffle.orderedgrouped.TestMergeManager)
> Time elapsed: 1.395 sec <<< FAILURE!
> java.lang.AssertionError: Values should be different. Actual: 1
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failEquals(Assert.java:185)
> at org.junit.Assert.assertNotEquals(Assert.java:161)
> at org.junit.Assert.assertNotEquals(Assert.java:198)
> at org.junit.Assert.assertNotEquals(Assert.java:209)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.TestMergeManager.testLocalDiskMergeMultipleTasks(TestMergeManager.java:878)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.TestMergeManager.testLocalDiskMergeMultipleTasks(TestMergeManager.java:628)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)