[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491299#comment-15491299
 ] 

Gera Shegalov commented on MAPREDUCE-6778:
------------------------------------------

Thanks for working on this longstanding issue, [~abalitsky1]. I see the 
following drawbacks:
- potential for circular dependency: stdout/err->log4j->appender->stdout
- user code can do overrides of system out as well
- as you point out the solution is limited in scope to MR

One would think the solution for all YARN apps could be using linux containers 
with disk quotas but this seems to be broken: 
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1515615. And apps can 
actually write their logs wherever the are pleased to.

This indeed necessitates an  app-specific solution but it would be nice if it 
was more general and easier to replicate across apps. 

Some solution attempts in the past were based on piping via {{tail}} and as you 
allude to, the output is not visible until the container execution ends which 
is difficult for devops with MR, let alone long-running services.

I would suggest to allow specifying a "| <command>" instead of a simple 
redirect. This should be optional because any external process increase the 
container size. The benefit is minimum coding and maximum flexibility. Some 
users will be ok with {{tail}}, some will deploy 
[rotatelogs|https://httpd.apache.org/docs/2.4/programs/rotatelogs.html], or 
[other utils|http://superuser.com/questions/291368/log-rotation-of-stdout].
  

> Provide way to limit MRJob's stdout/stderr size
> -----------------------------------------------
>
>                 Key: MAPREDUCE-6778
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6778
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.7.0
>            Reporter: Aleksandr Balitsky
>            Priority: Minor
>         Attachments: MAPREDUCE-6778.v1.001.patch
>
>
> We can run job with huge amount of stdout/stderr and causing undesired 
> consequence.
> The possible solution is to redirect Stdout's and Stderr's output to log4j in 
> YarnChild.java main method.
> In this case System.out and System.err streams will be redirected to log4j 
> logger with  appender that will direct output in to stderr or stdout files 
> with needed size limitation. Thereby we are able to limit log's size on the 
> fly, having one backup rolling file (thanks to ContainerRollingLogAppender).
> One of the syslog's size limitation approaches works the same way.
> So, we can set limitation via new properties in mapred-site.xml:
> mapreduce.task.userlog.stderr.limit.kb
> mapreduce.task.userlog.stdout.limit.kb
> Advantages of such solution:
> - it allows us to restrict file sizes during job execution.
> - we can see logs during job execution.
> Disadvantages:
> - It will work only for MRs jobs.
> Is it appropriate solution for solving this problem, or is there something 
> better?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to