I am currently on a project using log4j for the first time. We're using 1.1.3. I had many concerns with the amount of transient objects created during logging to a DailyRollingFileAppender. I refactored log4j making memory usage improvements that I think could be very useful to log4j users.
One recent test showed 350MB of transient memory created by log4j to log 33 MB of data consisting of 33500 log entries; the number of transient objects numbered in the millions and was uncountable by my tools. The layout pattern used was: log4j.appender.R.layout.ConversionPattern=%d{MMM dd yyyy HH:mm:ss,SSS} %t % -5p %c - %m%n Without date formatting the total was ~275 MB. I appreciate the functionality offered by this framework but I find it's disregard for memory consumption disturbing. I refactored it and I can now run the same test generating only 4 MB of transient data and 105000 transient objects for the same 33 MB data set. The remaining 4 MB and objects are JDK deficiencies and can't be overcome easily: 1) The JDK's Thread class stores it's name as a char[]. Every time Thread.getName() is called that char[] is converted to a String resulting in 2 new objects, the String and it's associated char[]. I am going to see if I can get a message through to the Sun folks to look at this. Of course, this framework could store the name once in a local variable on first lookup, but this assumes the Thread name will never change 2) The sun.io.CharToByteSingleByte.convert method leaks a byte[1] every time invoked. This means for N messages logged, N * 3 objects are created from the JDK when outputting Thread name and using a FileAppender. I submit any other transient object creation currently created by log4j is unwarranted and a potential performance issue. 3) What if you knew logging a message would cost 1 KB of data for every message sent no matter the size? Well if you output the date using default log4j functionality it does. java.text.DateFormat.format generates 30 transient objects using ~1 KB for every call. I wouldn't recommend to use DateFormat to stream dates, it is not built into it's functionality to do it efficiently. It is possible to output the same information without using any transient objects. From my tests, a large % of the objects were created from date formatting. LOG 4J Architecture Problems WRT Memory: The problem with Log4J is that it uses a StringBuffer to format messages. This means that the resulting formatted message is a transient object, it is ready for GC as soon as the entry is logged. This may highly degrade system performance. Not only does the message waste space, it creates many objects in the GC graph and makes GC run longer - wasted CPU cycles. One advancement I noticed in Log4J 1.2.6 was the use of a BufferedWriter around FileWriters in FileAppenders. This step saves the final formatted StringBuffer response from being copied into transient memory, a char[] the size of the StringBuffer, one last time in FileWriter.write(String). Instead it is copied into the BufferedWriter's reusable char[] which will call FileWriter.write(char[]). This stops creating the message 3 times in memory, it is now only 2. Can it be 1? It is 2 times in memory because the current architecture requires the copying into the StringBuffer. How about getting rid of the StringBuffer and directly appending to the BufferedWriter? Instead of formatting to memory that is ready for GC as soon as logging is over, format it in reusable memory in the BufferedWriter. It is cleaner, more efficient and is a first step towards 0 transient object generation and 0 excess memory usage during logging. To achieve this goal I added these methods to the following classes: public abstract class Layout implements OptionHandler { abstract public void stream(LoggingEvent event, Writer out) throws IOException; } public abstract class PatternConverter { public final void format(Writer out, LoggingEvent e) throws IOException { convert(e, out); } abstract protected void convert(LoggingEvent event, Writer out) throws IOException; // Convert and send directly to writer protected final void formatStringToWriter(String s, Writer out) throws IOException {} // Since most converters will simply output a String, this is added as a utility // and will format the String correctly to the writer, } An appender would call the Layout.stream(LoggingEvent, Writer) method instead of format. If an appender was interested in a string of the layout, a StringWriter could be passed in. Since the synchronization of the logging event is at the AppenderSkeleton.doAppend method and before the formatting of the response, writing to the BufferedWriter instead of a StringBuffer is thread safe. It is the same effect except the char[] in BufferedWriter is reusable and requires one less copy. Objects that represent XML can be mapped to an ObjectFormatter and be streamed to disk instead of first having an extra copy in memory. This has been a huge savings on my project. If the XML is 170 KB on disk it costs NO transient JVM heap memory to stream the Object as XML to disk; there is never a complete in memory representation of the XML as a String, this is not possible with the default log4j implementation. Of course the XML Binding Framework I use is 0 transient object usage. With log4j 1.1.3 it costs at least 2 in memory representations and with 1.2.6 at least 1. Since formatting dates was a critical concern, I created a PatternConverter that creates 1 Date and 1 Calendar. These objects are used in conjunction with a set of numbers pre-converted to Strings to format the date with less CPU cycles and no transient memory compared to the default implementation. Since the set of numbers for date formatting(hours, min, secs, ms, date, month) is finite I pre-created a set of the numbers. I simply created a static array of these numbers, it is a simple lookup: number[10] = "10". Theoretically, to format the date with almost the same precision and flexibility as DateFormatter with no excess memory one could just use log4j parameter markers to format the date provided the correct Converters are created. I propose the Date object become a member of LoggingEvent where it can be initialized once per logging event and date parameters pulled from a Converter. One of the last pieces of memory to clean up was the LoggingEvents themselves. There is no reason why these objects can't be pooled. I added this functionality to an extended Category and LoggingEvent and removed the 1 time use of these objects. To accomplish this it was neccesary to make some changes to LoggingEvent to re-initialize itself. I view the logging event as a thread resource, but looking up in a pool is just as easy. I view these changes as a positive step to log4j. I don't know how the process works of proposing/making changes to the project. If I can be helpful in advancing these ideas in any way please let me know. Thanks, Chris Nokes Perficient, Inc -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>