[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221090#comment-14221090
 ] 

Jason Lowe commented on MAPREDUCE-5932:
---------------------------------------

Thanks for the update.

With the addition of the AM flag to addLog4jProperties we're now inconsistent 
with how we convey the "task" type.  AMs are passed via direct arguments while 
map/reduce tasks have the caller stuffing the value in the conf immediately 
before calling and the callee has to dig that out.  It'd be simpler (and 
faster) to just pass the isMapTask info to addLog4jProperties directly since 
the caller already knows and doesn't need to dig it out of the conf.  There's a 
lot of ways to do this, including tacking on yet another boolean (which isn't 
great for readability given there's now one for the AppMaster), passing an enum 
that can differentiate AM/map/task (not sure one exists to reuse), pass the 
Task/TaskId object and null means AM, etc.



> Provide an option to use a dedicated reduce-side shuffle log
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-5932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5932
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 2.4.0
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5932.v01.patch, MAPREDUCE-5932.v02.patch, 
> MAPREDUCE-5932.v03.patch, MAPREDUCE-5932.v04.patch
>
>
> For reducers in large jobs our users cannot easily spot portions of the log 
> associated with problems with their code. An example reducer with INFO-level 
> logging generates ~3500 lines / ~700KiB  lines per second. 95% of the log is 
> the client-side of the shuffle {{org.apache.hadoop.mapreduce.task.reduce.*}}
> {code}
> $ wc syslog 
>     3642   48192  691013 syslog
> $ grep task.reduce syslog | wc 
>     3424   46534  659038
> $ grep task.reduce.ShuffleScheduler syslog | wc 
>     1521   17745  251458
> $ grep task.reduce.Fetcher syslog | wc 
>     1045   15340  223683
> $ grep task.reduce.InMemoryMapOutput syslog | wc 
>      400    4800   72060
> $ grep task.reduce.MergeManagerImpl syslog | wc 
>      432    8200  106555
> {code}
> Byte percentage breakdown:
> {code}
> Shuffle total:           95%
> ShuffleScheduler:        36%
> Fetcher:                 32%
> InMemoryMapOutput:       10%
> MergeManagerImpl:        15%
> {code}
> While this is information is actually often useful for devops debugging 
> shuffle performance issues, the job users are often lost. 
> We propose to have a dedicated syslog.shuffle file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to