[
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513830#comment-14513830
]
TezQA commented on TEZ-2358:
----------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12728354/TEZ-2358.4.patch
against master revision 2935ef4.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/541//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/541//console
This message is automatically generated.
> Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
> -------------------------------------------------------------------------
>
> Key: TEZ-2358
> URL: https://issues.apache.org/jira/browse/TEZ-2358
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Gopal V
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch,
> TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2
>
>
> The Tez MergeManager code assumes that the src-task-id is unique between
> merge operations, this results in some confusion when two merge sequences
> have to process output from the same src-task-id.
> {code}
> private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
> List<MapOutput> inMemoryMapOutputs,
> List<FileChunk> onDiskMapOutputs
> ...
> if (inMemoryMapOutputs.size() > 0) {
> int srcTaskId =
> inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
> ...
> // must spill to disk, but can't retain in-mem for intermediate merge
> final Path outputPath =
> mapOutputFile.getInputFileForWrite(srcTaskId,
> inMemToDiskBytes).suffix(
>
> Constants.MERGED_OUTPUT_PREFIX);
> ...
> {code}
> This or some scenario related to this, results in the following FileChunks
> list which contains identical named paths with different lengths.
> {code}
> 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]]
> orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
> 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger:
> Merging 6 sorted segments
> 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down
> to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
> 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]]
> orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027
> Merge of the 6 files in-memory complete. Local file is
> /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
> of size 785583965
> 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]]
> orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs
> and 5 on-disk map-outputs
> 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]]
> orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 +=
> 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
> 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]]
> orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 +=
> 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
> 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]]
> orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 +=
> 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
> {code}
> The multiple instances of 404.out indicates that we pulled two pipelined
> chunks of the same shuffle src id, once into memory and twice onto disk.
> {code}
> 2015-04-23 03:28:08,256 INFO
> [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]]
> orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143,
> targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host:
> cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent:
> attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
> 2015-04-23 03:28:08,270 INFO
> [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]]
> orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143,
> targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host:
> cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent:
> attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
> 2015-04-23 03:28:08,272 INFO
> [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]]
> orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143,
> targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host:
> cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent:
> attempt_1429683757595_0141_1_00_000404_0_10009_2, runDuration: 0]
> {code}
> This will fail depending on how many times _404_0 is at the top of the
> FileChunks list in this run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)