[jira] Commented: (HADOOP-211) logging improvements for Hadoop

Sanjay Dahiya (JIRA) Mon, 03 Jul 2006 05:52:16 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-211?page=comments#action_12418962 ]


Sanjay Dahiya commented on HADOOP-211:
--------------------------------------

I'm looking at support logging features like (cap on time/size, gzip) and 
archiving log files into DFS. Log4j 1.3 with XML configurations makes it real 
easy to implement all these with the RollingPolicies and Triggers separated 
from appenders. properties file format doesn't allow for specifying 
RollingPolicies externally for existing Appenders. 
Are you embedding Tomcat within Hadoop or using Hadoop from a webapp? Is it 
possible to make tomcat use its own properties file or configure Log4J for the 
webapp separately in the webapp's class loader? 

> logging improvements for Hadoop
> -------------------------------
>
>          Key: HADOOP-211
>          URL: http://issues.apache.org/jira/browse/HADOOP-211
>      Project: Hadoop
>         Type: Improvement

>     Versions: 0.2.0
>     Reporter: Sameer Paranjpye
>     Assignee: Sameer Paranjpye
>     Priority: Minor
>      Fix For: 0.3.0
>  Attachments: acl-log4j-II.patch.tgz, acl-log4j-webapps.patch, 
> acl-log4j.patch, commons_logging_patch
>
> Here's a proposal for some impovements to the way Hadoop does logging. It 
> advocates 3 
> broad changes to the way logging is currently done, these being:
> - The use of a uniform logging format by all Hadoop subsystems
> - The use of Apache commons logging as a facade above an underlying logging 
> framework
> - The use of Log4J as the underlying logging framework instead of 
> java.util.logging
> This is largely polishing work, but it seems like it would make log analysis 
> and debugging
> easier in the short term. In the long term, it would future proof logging to 
> the extent of
> allowing the logging framework used to change while requiring minimal code 
> change. The 
> propos changes are motivated by the following requirements which we think 
> Hadoops 
> logging should meet:
> - Hadoops logs should be amenable to analysis by tools like grep, sed, awk 
> etc.
> - Log entries should be clearly annotated with a timestamp and a logging level
> - Log entries should be traceable to the subsystem from which they originated
> - The logging implementation should allow log entries to be annotated with 
> source code 
> location information like classname, methodname, file and line number, 
> without requiring
> code changes
> - It should be possible to change the logging implementation used without 
> having to change
> thousands of lines of code
> - The mapping of loggers to destinations (files, directories, servers etc.) 
> should be 
> specified and modifiable via configuration
> Uniform logging format:
> All Hadoop logs should have the following structure.
> <Header>\n
> <LogEntry>\n [<Exception>\n]
> .
> .
> .
> where the header line specifies the format of each log entry. The header line 
> has the format:
> '# <Fieldname> <Fieldname>...\n'. 
> The default format of each log entry is: '# Timestamp Level LoggerName 
> Message', where:
> - Timestamp is a date and time in the format MM/DD/YYYY:HH:MM:SS
> - Level is the logging level (FATAL, WARN, DEBUG, TRACE, etc.)
> - LoggerName is the short name of the logging subsystem from which the 
> message originated e.g.
> fs.FSNamesystem, dfs.Datanode etc.
> - Message is the log message produced
> Why Apache commons logging and Log4J?
> Apache commons logging is a facade meant to be used as a wrapper around an 
> underlying logging
> implementation. Bridges from Apache commons logging to popular logging 
> implementations 
> (Java logging, Log4J, Avalon etc.) are implemented and available as part of 
> the commons logging
> distribution. Implementing a bridge to an unsupported implementation is 
> fairly striaghtforward
> and involves the implementation of subclasses of the commons logging 
> LogFactory and Logger 
> classes. Using Apache commons logging and making all logging calls through it 
> enables us to
> move to a different logging implementation by simply changing configuration 
> in the best case.
> Even otherwise, it incurs minimal code churn overhead.
> Log4J offers a few benefits over java.util.logging that make it a more 
> desirable choice for the
> logging back end.
> - Configuration Flexibility: The mapping of loggers to destinations (files, 
> sockets etc.)
> can be completely specified in configuration. It is possible to do this with 
> Java logging as
> well, however, configuration is a lot more restrictive. For instance, with 
> Java logging all 
> log files must have names derived from the same pattern. For the namenode, 
> log files could 
> be named with the pattern "%h/namenode%u.log" which would put log files in 
> the user.home
> directory with names like namenode0.log etc. With Log4J it would be possible 
> to configure
> the namenode to emit log files with different names, say heartbeats.log, 
> namespace.log,
> clients.log etc. Configuration variables in Log4J can also have the values of 
> system 
> properties embedded in them.
> - Takes wrappers into account: Log4J takes into account the possibility that 
> an application
> may be invoking it via a wrapper, such as Apache commons logging. This is 
> important because
> logging event objects must be able to infer the context of the logging call 
> such as classname,
> methodname etc. Inferring context is a relatively expensive operation that 
> involves creating
> an exception and examining the stack trace to find the frame just before the 
> first frame 
> of the logging framework. It is therefore done lazily only when this 
> information actually 
> needs to be logged. Log4J can be instructed to look for the frame 
> corresponding to the wrapper
> class, Java logging cannot. In the case of Java logging this means that a) 
> the bridge from 
> Apache commons logging is responsible for inferring the calling context and 
> setting it in the 
> logging event and b) this inference has to be done on every logging call 
> regardless of whether
> or not it is needed.
> - More handy features: Log4J has some handy features that Java logging 
> doesn't. A couple
> of examples of these:
> a) Date based rolling of log files 
> b) Format control through configuration. Log4J has a PatternLayout class that 
> can be 
> configured to generate logs with a user specified pattern. The logging format 
> described
> above can be described as "%d{MM/dd/yyyy:HH:mm:SS} %c{2} %p %m". The format 
> specifiers
> indicate that each log line should have the date and time followed by the 
> logger name followed
> by the logging level or priority followed by the application generated 
> message.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-211) logging improvements for Hadoop

Reply via email to