[ 
https://issues.apache.org/jira/browse/TEZ-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696530#comment-14696530
 ] 

TezQA commented on TEZ-2719:
----------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12750447/TEZ-2719.1.patch
  against master revision 6b67b0b.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/989//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/989//console

This message is automatically generated.

> Consider reducing logs in unordered fetcher with shared-fetch option
> --------------------------------------------------------------------
>
>                 Key: TEZ-2719
>                 URL: https://issues.apache.org/jira/browse/TEZ-2719
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2719.1.patch, TEZ-2719.2.patch, 
> TEZ-2719.branch-0.7.patch
>
>
> For large broadcast, this can be a problem
> e.g 
> In one of the jobs (query_17 @ 10 TB scale), Map 7 generates around 1.1 GB of 
> data which is given to 330 tasks in downstream Map 1.
> Map 1 uses all slots in cluster (~ 224 per wave). Until data is downloaded, 
> shared fetch would end up re-queuing fetches.  As a part of it, it would end 
> up printing 3 logs per attempt. E.g
> {noformat}
> 2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Requeuing 
> machine1:13562 downloads because we didn't get a lock
> 2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Shared 
> fetch failed to return 1 inputs on this try
> 2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager: 
> Scheduling fetch for inputHost: machine1:13562
> 2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager: 
> Created Fetcher for host: machine1 with inputs: [InputAttemptIdentifier 
> [inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0, 
> pathComponent=attempt_1439264591968_0058_1_04_000000_0_10029, 
> fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1]]
> {noformat}
> Based on disk / network, it  might take time for fetcher to finish 
> downloading and release the lock.  Since there was only one task in Map-1, it 
> ended up in a sort of tight loop generating relatively larger logs.
> Looks like 260-290 MB task log files are created in this case per attempt.  
> That would be around 2.3 GB to 3 GB (depending on number of slots waiting) in 
> machine with 8-10 slots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to