I am currently on a project using log4j for the first time.  We're using
1.1.3.  I had many concerns with the amount of transient objects created
during logging to a DailyRollingFileAppender.  I refactored log4j making
memory usage improvements that I think could be very useful to log4j users.

One recent test showed 350MB of transient memory created by log4j to log 33
MB of data consisting of 33500 log entries; the number of transient objects
numbered in the millions and was uncountable by my tools.  The layout
pattern used was:

log4j.appender.R.layout.ConversionPattern=%d{MMM dd yyyy HH:mm:ss,SSS} %t %
-5p %c - %m%n

Without date formatting the total was ~275 MB.

I appreciate the functionality offered by this framework but I find it's
disregard for memory consumption disturbing.  I refactored it and I can now
run the same test generating only 4 MB of transient data and 105000
transient objects for the same 33 MB data set.  The remaining 4 MB and
objects are JDK deficiencies and can't be overcome easily:
1)  The JDK's Thread class stores it's name as a char[].  Every time
Thread.getName() is called that char[] is converted to a String resulting
in 2 new objects, the String and it's associated char[].  I am going to see
if I can get a message through to the Sun folks to look at this.  Of
course, this framework could store the name once in a local variable on
first lookup, but this assumes the Thread name will never change
2)  The sun.io.CharToByteSingleByte.convert method leaks a byte[1] every
time invoked.

This means for N messages logged, N * 3 objects are created from the JDK
when outputting Thread name and using a FileAppender.  I submit any other
transient object creation currently created by log4j is unwarranted and a
potential performance issue.

3) What if you knew logging a message would cost 1 KB of data for every
message sent no matter the size?  Well if you output the date using default
log4j functionality it does.  java.text.DateFormat.format generates 30
transient objects using ~1 KB for every call.  I wouldn't recommend to use
DateFormat to stream dates, it is not built into it's functionality to do
it efficiently.  It is possible to output the same information without
using any transient objects.  From my tests, a large % of the objects were
created from date formatting.

LOG 4J Architecture Problems WRT Memory:
The problem with Log4J is that it uses a StringBuffer to format messages.
This means that the resulting formatted message is a transient object, it
is ready for GC as soon as the entry is logged.  This may highly degrade
system performance.  Not only does the message waste space, it creates many
objects in the GC graph and makes GC run longer - wasted CPU cycles.

One advancement I noticed in Log4J 1.2.6 was the use of a BufferedWriter
around FileWriters in FileAppenders.  This step saves the final formatted
StringBuffer response from being copied into transient memory, a char[] the
size of the StringBuffer, one last time in FileWriter.write(String).
Instead it is copied into the BufferedWriter's reusable char[] which will
call FileWriter.write(char[]).  This stops creating the message 3 times in
memory, it is now only 2.  Can it be 1?  It is 2 times in memory because
the current architecture requires the copying into the StringBuffer.  How
about getting rid of the StringBuffer and directly appending to the
BufferedWriter?  Instead of formatting to memory that is ready for GC as
soon as logging is over, format it in reusable memory in the
BufferedWriter.  It is cleaner, more efficient and is a first step towards
0 transient object generation and 0 excess memory usage during logging.

To achieve this goal I added these methods to the following classes:
public abstract class Layout implements OptionHandler {
      abstract public void stream(LoggingEvent event, Writer out) throws
IOException;
}

public abstract class PatternConverter {
      public final void format(Writer out, LoggingEvent e) throws
IOException {
            convert(e, out);
      }

      abstract protected void convert(LoggingEvent event, Writer out)
throws IOException;
            // Convert and send directly to writer

      protected final void formatStringToWriter(String s, Writer out)
throws IOException {}
            // Since most converters will simply output a String, this is
added as a utility
            // and will format the String correctly to the writer,
}

An appender would call the Layout.stream(LoggingEvent, Writer) method
instead of format.  If an appender was interested in a string of the
layout, a StringWriter could be passed in.  Since the synchronization of
the logging event is at the AppenderSkeleton.doAppend method and before the
formatting of the response, writing to the BufferedWriter instead of a
StringBuffer is thread safe.  It is the same effect except the char[] in
BufferedWriter is reusable and requires one less copy.

Objects that represent XML can be mapped to an ObjectFormatter and be
streamed to disk instead of first having an extra copy in memory.  This has
been a huge savings on my project.  If the XML is 170 KB on disk it costs
NO transient JVM heap memory to stream the Object as XML to disk; there is
never a complete in memory representation of the XML as a String, this is
not possible with the default log4j implementation.  Of course the XML
Binding Framework I use is 0 transient object usage.  With log4j 1.1.3 it
costs at least 2 in memory representations and with 1.2.6 at least 1.

Since formatting dates was a critical concern, I created a PatternConverter
that creates 1 Date and 1 Calendar.  These objects are used in conjunction
with a set of numbers pre-converted to Strings to format the date with less
CPU cycles and no transient memory compared to the default implementation.
Since the set of numbers for date formatting(hours, min, secs, ms, date,
month) is finite I pre-created a set of the numbers.  I simply created a
static array of these numbers, it is a simple lookup: number[10] = "10".
Theoretically, to format the date with almost the same precision and
flexibility as DateFormatter with no excess memory one could just use log4j
parameter markers to format the date provided the correct Converters are
created.  I propose the Date object become a member of LoggingEvent where
it can be initialized once per logging event and date parameters pulled
from a Converter.

One of the last pieces of memory to clean up was the LoggingEvents
themselves.  There is no reason why these objects can't be pooled.  I added
this functionality to an extended Category and LoggingEvent and removed the
1 time use of these objects.  To accomplish this it was neccesary to make
some changes to LoggingEvent to re-initialize itself.  I view the logging
event as a thread resource, but looking up in a pool is just as easy.

I view these changes as a positive step to log4j.  I don't know how the
process works of proposing/making changes to the project.  If I can be
helpful in advancing these ideas in any way please let me know.

Thanks,
Chris Nokes
Perficient, Inc



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to