Hi Chris, Object reuse and optimizing memory usage was one of the themes I was seriously considering for future log4j releases.
Your results are somewhat surprising. I knew memory usage could be improved but hadn't realized the extent... More comments below. At 10:42 03.10.2002 -0500, [EMAIL PROTECTED] wrote: >I am currently on a project using log4j for the first time. We're using >1.1.3. I had many concerns with the amount of transient objects created >during logging to a DailyRollingFileAppender. I refactored log4j making >memory usage improvements that I think could be very useful to log4j users. > >One recent test showed 350MB of transient memory created by log4j to log 33 >MB of data consisting of 33500 log entries; the number of transient objects >numbered in the millions and was uncountable by my tools. The layout >pattern used was: Millions? Wow! >log4j.appender.R.layout.ConversionPattern=%d{MMM dd yyyy HH:mm:ss,SSS} %t % >-5p %c - %m%n > >Without date formatting the total was ~275 MB. What happens with just %d (with nothing within the braces)? With %r? With no time information? >I appreciate the functionality offered by this framework but I find it's >disregard for memory consumption disturbing. I refactored it and I can now >run the same test generating only 4 MB of transient data and 105000 >transient objects for the same 33 MB data set. The remaining 4 MB and >objects are JDK deficiencies and can't be overcome easily: >1) The JDK's Thread class stores it's name as a char[]. Every time >Thread.getName() is called that char[] is converted to a String resulting >in 2 new objects, the String and it's associated char[]. I am going to see >if I can get a message through to the Sun folks to look at this. Of >course, this framework could store the name once in a local variable on >first lookup, but this assumes the Thread name will never change We could circumvent calling Thread.getName() by caching the name of the thread. In general however, I deem it bad practice to go try to be smarter than the language. >2) The sun.io.CharToByteSingleByte.convert method leaks a byte[1] every >time invoked. You must be kidding. :-) >This means for N messages logged, N * 3 objects are created from the JDK >when outputting Thread name and using a FileAppender. I submit any other >transient object creation currently created by log4j is unwarranted and a >potential performance issue. Agreed. In log4j defense, I should say that log4j aims to be reliable, fast and extensible, in that order of priority. It is easier to be reliable with simpler but albeit less optimized code. >3) What if you knew logging a message would cost 1 KB of data for every >message sent no matter the size? Well if you output the date using default >log4j functionality it does. java.text.DateFormat.format generates 30 >transient objects using ~1 KB for every call. I wouldn't recommend to use >DateFormat to stream dates, it is not built into it's functionality to do >it efficiently. It is possible to output the same information without >using any transient objects. From my tests, a large % of the objects were >created from date formatting. Do you mean SimpleDateFormat or just DateFormat. SimpleDateFormat is very inefficient. However, ISO8601DateFormat, DateTimeDateFormat and AbsoluteTimeDateFormat that ship with log4j and which all derive from DateFormat are supposed to be much more efficient. So I reiterate my question. Do you mean SimpleDateFormat or DateFormat? >LOG 4J Architecture Problems WRT Memory: >The problem with Log4J is that it uses a StringBuffer to format messages. >This means that the resulting formatted message is a transient object, it >is ready for GC as soon as the entry is logged. This may highly degrade >system performance. Not only does the message waste space, it creates many >objects in the GC graph and makes GC run longer - wasted CPU cycles. Sure, although recent JVMs are supposed to be much more efficient with respect to garbage collection. >One advancement I noticed in Log4J 1.2.6 was the use of a BufferedWriter >around FileWriters in FileAppenders. This step saves the final formatted >StringBuffer response from being copied into transient memory, a char[] the >size of the StringBuffer, one last time in FileWriter.write(String). >Instead it is copied into the BufferedWriter's reusable char[] which will >call FileWriter.write(char[]). This stops creating the message 3 times in >memory, it is now only 2. Can it be 1? It is 2 times in memory because >the current architecture requires the copying into the StringBuffer. How >about getting rid of the StringBuffer and directly appending to the >BufferedWriter? Instead of formatting to memory that is ready for GC as >soon as logging is over, format it in reusable memory in the >BufferedWriter. It is cleaner, more efficient and is a first step towards >0 transient object generation and 0 excess memory usage during logging. It would be a major step towards 0 transient object generation, 0 excess memory usage during logging and more efficient, that much is certain. IMHO, the current approach is clean and very generic but optimal it is not. >To achieve this goal I added these methods to the following classes: >public abstract class Layout implements OptionHandler { > abstract public void stream(LoggingEvent event, Writer out) throws >IOException; >} > >public abstract class PatternConverter { > public final void format(Writer out, LoggingEvent e) throws >IOException { > convert(e, out); > } > > abstract protected void convert(LoggingEvent event, Writer out) >throws IOException; > // Convert and send directly to writer > > protected final void formatStringToWriter(String s, Writer out) >throws IOException {} > // Since most converters will simply output a String, this is >added as a utility > // and will format the String correctly to the writer, >} Did you remove Layout.format(LoggingEvent) returning a String? >An appender would call the Layout.stream(LoggingEvent, Writer) method >instead of format. If an appender was interested in a string of the >layout, a StringWriter could be passed in. Since the synchronization of >the logging event is at the AppenderSkeleton.doAppend method and before the >formatting of the response, writing to the BufferedWriter instead of a >StringBuffer is thread safe. It is the same effect except the char[] in >BufferedWriter is reusable and requires one less copy. So you are suggesting that Layout.format(LoggingEvent) no longer be used because it can be replaced by Layout.stream(LoggingEvent, Writer)? Appenders such as SMTPAppender that handle the String returned by Layout.format(LoggingEvent) can use Layout.stream(LoggingEvent, Writer) where the writer is a StringWriter. Is that what you saying? >Objects that represent XML can be mapped to an ObjectFormatter and be >streamed to disk instead of first having an extra copy in memory. This has >been a huge savings on my project. If the XML is 170 KB on disk it costs >NO transient JVM heap memory to stream the Object as XML to disk; there is >never a complete in memory representation of the XML as a String, this is >not possible with the default log4j implementation. Of course the XML >Binding Framework I use is 0 transient object usage. With log4j 1.1.3 it >costs at least 2 in memory representations and with 1.2.6 at least 1. I don't understand. What do you mean by Objects that represent XML? >Since formatting dates was a critical concern, I created a PatternConverter >that creates 1 Date and 1 Calendar. These objects are used in conjunction >with a set of numbers pre-converted to Strings to format the date with less >CPU cycles and no transient memory compared to the default implementation. >Since the set of numbers for date formatting(hours, min, secs, ms, date, >month) is finite I pre-created a set of the numbers. I simply created a >static array of these numbers, it is a simple lookup: number[10] = "10". Good one. You did this to the DateFormatter that ship with log4j? >Theoretically, to format the date with almost the same precision and >flexibility as DateFormatter with no excess memory one could just use log4j >parameter markers to format the date provided the correct Converters are >created. I propose the Date object become a member of LoggingEvent where >it can be initialized once per logging event and date parameters pulled >from a Converter. That is very much thought provoking. I'm not 100% sure the approach is problem free but it sure is very original. >One of the last pieces of memory to clean up was the LoggingEvents >themselves. There is no reason why these objects can't be pooled. I added >this functionality to an extended Category and LoggingEvent and removed the >1 time use of these objects. To accomplish this it was neccesary to make >some changes to LoggingEvent to re-initialize itself. I view the logging >event as a thread resource, but looking up in a pool is just as easy. Yes, although the gains are not necessary that significant because there is just one LoggingEvent created for each, well..., logging event. If as you say 33'000 logs generate millions of objects than 33'000 less will not make that much of a difference. Would you agree? >I view these changes as a positive step to log4j. I don't know how the >process works of proposing/making changes to the project. If I can be >helpful in advancing these ideas in any way please let me know. A very positive step indeed! You are doing great. Posting on log4j-dev was the right thing to do. >Thanks, >Chris Nokes >Perficient, Inc No, thank you. -- Ceki TCP implementations will follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others. -- Jon Postel, RFC 793 -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>