Once we settle on something, let's post it on the twiki.
On May 17, 2006, at 10:07 AM, Doug Cutting (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-211?
page=comments#action_12412213 ]
Doug Cutting commented on HADOOP-211:
-------------------------------------
The semantics I use for levels is something like:
SEVERE: if this is a production system, someone should be paged,
red lights should flash, etc. Something is definitely wrong and
the system is not operating correctly. Intervention is required.
This should be used sparingly.
WARN: in a production system, warnings should be propagated &
summarized on a central console. If lots are generated then
something may be wrong.
INFO, FINE, FINER, etc. are used for debugging. INFO is the level
normally logged in production, FINE, FINER, etc. are typically only
used when developing.
Is that consistent with the way others use these?
logging improvements for Hadoop
-------------------------------
Key: HADOOP-211
URL: http://issues.apache.org/jira/browse/HADOOP-211
Project: Hadoop
Type: Improvement
Versions: 0.2
Reporter: Sameer Paranjpye
Assignee: Sameer Paranjpye
Priority: Minor
Fix For: 0.3
Here's a proposal for some impovements to the way Hadoop does
logging. It advocates 3
broad changes to the way logging is currently done, these being:
- The use of a uniform logging format by all Hadoop subsystems
- The use of Apache commons logging as a facade above an
underlying logging framework
- The use of Log4J as the underlying logging framework instead of
java.util.logging
This is largely polishing work, but it seems like it would make
log analysis and debugging
easier in the short term. In the long term, it would future proof
logging to the extent of
allowing the logging framework used to change while requiring
minimal code change. The
propos changes are motivated by the following requirements which
we think Hadoops
logging should meet:
- Hadoops logs should be amenable to analysis by tools like grep,
sed, awk etc.
- Log entries should be clearly annotated with a timestamp and a
logging level
- Log entries should be traceable to the subsystem from which they
originated
- The logging implementation should allow log entries to be
annotated with source code
location information like classname, methodname, file and line
number, without requiring
code changes
- It should be possible to change the logging implementation used
without having to change
thousands of lines of code
- The mapping of loggers to destinations (files, directories,
servers etc.) should be
specified and modifiable via configuration
Uniform logging format:
All Hadoop logs should have the following structure.
<Header>\n
<LogEntry>\n [<Exception>\n]
.
.
.
where the header line specifies the format of each log entry. The
header line has the format:
'# <Fieldname> <Fieldname>...\n'.
The default format of each log entry is: '# Timestamp Level
LoggerName Message', where:
- Timestamp is a date and time in the format MM/DD/YYYY:HH:MM:SS
- Level is the logging level (FATAL, WARN, DEBUG, TRACE, etc.)
- LoggerName is the short name of the logging subsystem from which
the message originated e.g.
fs.FSNamesystem, dfs.Datanode etc.
- Message is the log message produced
Why Apache commons logging and Log4J?
Apache commons logging is a facade meant to be used as a wrapper
around an underlying logging
implementation. Bridges from Apache commons logging to popular
logging implementations
(Java logging, Log4J, Avalon etc.) are implemented and available
as part of the commons logging
distribution. Implementing a bridge to an unsupported
implementation is fairly striaghtforward
and involves the implementation of subclasses of the commons
logging LogFactory and Logger
classes. Using Apache commons logging and making all logging calls
through it enables us to
move to a different logging implementation by simply changing
configuration in the best case.
Even otherwise, it incurs minimal code churn overhead.
Log4J offers a few benefits over java.util.logging that make it a
more desirable choice for the
logging back end.
- Configuration Flexibility: The mapping of loggers to
destinations (files, sockets etc.)
can be completely specified in configuration. It is possible to do
this with Java logging as
well, however, configuration is a lot more restrictive. For
instance, with Java logging all
log files must have names derived from the same pattern. For the
namenode, log files could
be named with the pattern "%h/namenode%u.log" which would put log
files in the user.home
directory with names like namenode0.log etc. With Log4J it would
be possible to configure
the namenode to emit log files with different names, say
heartbeats.log, namespace.log,
clients.log etc. Configuration variables in Log4J can also have
the values of system
properties embedded in them.
- Takes wrappers into account: Log4J takes into account the
possibility that an application
may be invoking it via a wrapper, such as Apache commons logging.
This is important because
logging event objects must be able to infer the context of the
logging call such as classname,
methodname etc. Inferring context is a relatively expensive
operation that involves creating
an exception and examining the stack trace to find the frame just
before the first frame
of the logging framework. It is therefore done lazily only when
this information actually
needs to be logged. Log4J can be instructed to look for the frame
corresponding to the wrapper
class, Java logging cannot. In the case of Java logging this means
that a) the bridge from
Apache commons logging is responsible for inferring the calling
context and setting it in the
logging event and b) this inference has to be done on every
logging call regardless of whether
or not it is needed.
- More handy features: Log4J has some handy features that Java
logging doesn't. A couple
of examples of these:
a) Date based rolling of log files
b) Format control through configuration. Log4J has a PatternLayout
class that can be
configured to generate logs with a user specified pattern. The
logging format described
above can be described as "%d{MM/dd/yyyy:HH:mm:SS} %c{2} %p %m".
The format specifiers
indicate that each log line should have the date and time followed
by the logger name followed
by the logging level or priority followed by the application
generated message.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira