Re: [jira] Commented: (HADOOP-211) logging improvements for Hadoop

Eric Baldeschwieler Wed, 17 May 2006 19:57:53 -0700

Once we settle on something, let's post it on the twiki.


On May 17, 2006, at 10:07 AM, Doug Cutting (JIRA) wrote:

[ http://issues.apache.org/jira/browse/HADOOP-211?page=comments#action_12412213 ]
Doug Cutting commented on HADOOP-211:
-------------------------------------

The semantics I use for levels is something like:
SEVERE: if this is a production system, someone should be paged,red lights should flash, etc. Something is definitely wrong andthe system is not operating correctly. Intervention is required.This should be used sparingly.
WARN: in a production system, warnings should be propagated &summarized on a central console. If lots are generated thensomething may be wrong.
INFO, FINE, FINER, etc. are used for debugging. INFO is the levelnormally logged in production, FINE, FINER, etc. are typically onlyused when developing.
Is that consistent with the way others use these?
logging improvements for Hadoop
-------------------------------

         Key: HADOOP-211
         URL: http://issues.apache.org/jira/browse/HADOOP-211
     Project: Hadoop
        Type: Improvement
    Versions: 0.2
    Reporter: Sameer Paranjpye
    Assignee: Sameer Paranjpye
    Priority: Minor
     Fix For: 0.3
Here's a proposal for some impovements to the way Hadoop doeslogging. It advocates 3
broad changes to the way logging is currently done, these being:
- The use of a uniform logging format by all Hadoop subsystems
- The use of Apache commons logging as a facade above anunderlying logging framework- The use of Log4J as the underlying logging framework instead ofjava.util.loggingThis is largely polishing work, but it seems like it would makelog analysis and debuggingeasier in the short term. In the long term, it would future prooflogging to the extent ofallowing the logging framework used to change while requiringminimal code change. Thepropos changes are motivated by the following requirements whichwe think Hadoops
logging should meet:
- Hadoops logs should be amenable to analysis by tools like grep,sed, awk etc.- Log entries should be clearly annotated with a timestamp and alogging level- Log entries should be traceable to the subsystem from which theyoriginated- The logging implementation should allow log entries to beannotated with source codelocation information like classname, methodname, file and linenumber, without requiring
code changes
- It should be possible to change the logging implementation usedwithout having to change
thousands of lines of code
- The mapping of loggers to destinations (files, directories,servers etc.) should be
specified and modifiable via configuration
Uniform logging format:
All Hadoop logs should have the following structure.
<Header>\n
<LogEntry>\n [<Exception>\n]
.
.
.
where the header line specifies the format of each log entry. Theheader line has the format:
'# <Fieldname> <Fieldname>...\n'.
The default format of each log entry is: '# Timestamp LevelLoggerName Message', where:
- Timestamp is a date and time in the format MM/DD/YYYY:HH:MM:SS
- Level is the logging level (FATAL, WARN, DEBUG, TRACE, etc.)
- LoggerName is the short name of the logging subsystem from whichthe message originated e.g.
fs.FSNamesystem, dfs.Datanode etc.
- Message is the log message produced
Why Apache commons logging and Log4J?
Apache commons logging is a facade meant to be used as a wrapperaround an underlying loggingimplementation. Bridges from Apache commons logging to popularlogging implementations(Java logging, Log4J, Avalon etc.) are implemented and availableas part of the commons loggingdistribution. Implementing a bridge to an unsupportedimplementation is fairly striaghtforwardand involves the implementation of subclasses of the commonslogging LogFactory and Loggerclasses. Using Apache commons logging and making all logging callsthrough it enables us tomove to a different logging implementation by simply changingconfiguration in the best case.
Even otherwise, it incurs minimal code churn overhead.
Log4J offers a few benefits over java.util.logging that make it amore desirable choice for the
logging back end.
- Configuration Flexibility: The mapping of loggers todestinations (files, sockets etc.)can be completely specified in configuration. It is possible to dothis with Java logging aswell, however, configuration is a lot more restrictive. Forinstance, with Java logging alllog files must have names derived from the same pattern. For thenamenode, log files couldbe named with the pattern "%h/namenode%u.log" which would put logfiles in the user.homedirectory with names like namenode0.log etc. With Log4J it wouldbe possible to configurethe namenode to emit log files with different names, sayheartbeats.log, namespace.log,clients.log etc. Configuration variables in Log4J can also havethe values of system
properties embedded in them.
- Takes wrappers into account: Log4J takes into account thepossibility that an applicationmay be invoking it via a wrapper, such as Apache commons logging.This is important becauselogging event objects must be able to infer the context of thelogging call such as classname,methodname etc. Inferring context is a relatively expensiveoperation that involves creatingan exception and examining the stack trace to find the frame justbefore the first frameof the logging framework. It is therefore done lazily only whenthis information actuallyneeds to be logged. Log4J can be instructed to look for the framecorresponding to the wrapperclass, Java logging cannot. In the case of Java logging this meansthat a) the bridge fromApache commons logging is responsible for inferring the callingcontext and setting it in thelogging event and b) this inference has to be done on everylogging call regardless of whether
or not it is needed.
- More handy features: Log4J has some handy features that Javalogging doesn't. A couple
of examples of these:
a) Date based rolling of log files
b) Format control through configuration. Log4J has a PatternLayoutclass that can beconfigured to generate logs with a user specified pattern. Thelogging format describedabove can be described as "%d{MM/dd/yyyy:HH:mm:SS} %c{2} %p %m".The format specifiersindicate that each log line should have the date and time followedby the logger name followedby the logging level or priority followed by the applicationgenerated message.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-211) logging improvements for Hadoop

Reply via email to