[
https://issues.apache.org/jira/browse/TEZ-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696530#comment-14696530
]
TezQA commented on TEZ-2719:
----------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12750447/TEZ-2719.1.patch
against master revision 6b67b0b.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 3.0.1) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/989//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/989//console
This message is automatically generated.
> Consider reducing logs in unordered fetcher with shared-fetch option
> --------------------------------------------------------------------
>
> Key: TEZ-2719
> URL: https://issues.apache.org/jira/browse/TEZ-2719
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2719.1.patch, TEZ-2719.2.patch,
> TEZ-2719.branch-0.7.patch
>
>
> For large broadcast, this can be a problem
> e.g
> In one of the jobs (query_17 @ 10 TB scale), Map 7 generates around 1.1 GB of
> data which is given to 330 tasks in downstream Map 1.
> Map 1 uses all slots in cluster (~ 224 per wave). Until data is downloaded,
> shared fetch would end up re-queuing fetches. As a part of it, it would end
> up printing 3 logs per attempt. E.g
> {noformat}
> 2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Requeuing
> machine1:13562 downloads because we didn't get a lock
> 2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Shared
> fetch failed to return 1 inputs on this try
> 2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager:
> Scheduling fetch for inputHost: machine1:13562
> 2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager:
> Created Fetcher for host: machine1 with inputs: [InputAttemptIdentifier
> [inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0,
> pathComponent=attempt_1439264591968_0058_1_04_000000_0_10029,
> fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1]]
> {noformat}
> Based on disk / network, it might take time for fetcher to finish
> downloading and release the lock. Since there was only one task in Map-1, it
> ended up in a sort of tight loop generating relatively larger logs.
> Looks like 260-290 MB task log files are created in this case per attempt.
> That would be around 2.3 GB to 3 GB (depending on number of slots waiting) in
> machine with 8-10 slots.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)