Hi all,

We use Virgo server and it comes with Logback 1.0.7.v20121108-1250. One of the 
users of our product started complaining that after about a week of using our 
product, it was becoming so slow that it was basically unusable. We had a live 
session with the client and it turned out that the problem was in Logback. The 
client uses Windows. So far the problem has not been reported on Linux. Here's 
what I did to determine that Logback was the cause of the problem:


1. The appender is defined as:


   <appender name="LOG_FILE" 
class="ch.qos.logback.core.rolling.RollingFileAppender">
      <file>D:/logs/product.log</file>
      <rollingPolicy 
class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
         <FileNamePattern>D:/logs/product_%i.log.zip</FileNamePattern>
         <MinIndex>0</MinIndex>
         <MaxIndex>9</MaxIndex>
      </rollingPolicy>

      <triggeringPolicy 
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
         <MaxFileSize>50MB</MaxFileSize>
      </triggeringPolicy>
      <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
         <Pattern>[%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}] [%-5level] %-28.28thread 
%X{sessionGuid} %-64.64logger{64} %X{medic.eventCode} %msg %ex%n</Pattern>
      </encoder>
   </appender>


In the appender configuration above, I modified the directories and log names a 
little bit, so that the actual product name would not be visible.


2. All the rolled-over zip files were generated and present, and the main log 
file, product.log, had exceeded its maximum size of 50 MB and had reached 64 
MB. We had something like:


product.log           <------- 64 MB

product_0.log.zip

product_1.log.zip

product_2.log.zip

product_3.log.zip

product_4.log.zip

product_5.log.zip

product_6.log.zip

product_7.log.zip

product_8.log.zip

product_9.log.zip


3. This was suspicious, so I examined the log file and saw many messages that 
were logged out of order, e.g.


09:55:43.676
09:55:42.836
09:55:46.165
09:55:44.583
...


Furthermore, even the simplest operations, e.g. simply sending a redirect 
response back to the browser, was taking several seconds. (However, before 
sending the redirect, we were logging the URL to which we were redirecting, so 
I suspected that the logger was extremely slow.)


3. I moved the archived log files to a different directory and they got 
recreated *immediately*. I did this 7-8 times, each time with exactly the same 
result -- the .zip files got recreated immediately.


4. The main log, product.log, was locked by the JVM, so I couldn't move it or 
delete it. However, I opened it with a text editor, selected the whole content 
(Ctrl+A), deleted it (Del) and saved the file. As soon as I did that, the 
performance problems disappeared. The product began working normally. Also, the 
log file started growing with normal speed. Previously, it was growing 
super-slowly.



So, we told the users to do the following workaround when the preconditions 
from step 2 above were reached and the product became very slow:


(1) Open the main log file, select all, delete, save.

(2) Delete all archived .zip files


This workaround has been working fine so far. However, we still don't know why 
Logback behaves like this. That's why I'm writing this email. Can someone, who 
knows the Logback source code, figure out what might be happening and what 
triggers it?


Here are some additional details that we've been able to figure out:


1. When the users were experiencing the problem during the live session with 
them, I opened ProcessExplorer and saw that Splunk was holding a lock onto the 
main log file -- product log. I asked the users to kill the Splunk service and 
they did, but that didn't solve the problem. However, depending on the way 
Logback is written, it may turn out that it indeed went crazy because of 
Splunk, but *after* it goes crazy, killing the other process that hold a lock 
on the file doesn't help.


2. We managed to reproduce the same behavior in our company by letting two 
different appenders write to the same log file. Then, as soon as the 
preconditions on step 2 above were reached, everything became extremely slow, 
the logs couldn't roll over and the main log started growing beyond its 
specified limit of 50 MB. However, the users do *not* have such a configuration 
in which two appenders write to the same file. I'm only giving you information 
on how we managed to reproduce the slowness in case it might help you figure 
out what's wrong. The end result is the same -- Logback slows down to a crawl, 
but apparently the trigger in our users' environment is different and it's not 
related to appenders writing to the same file.


3. A thread dump while we were reproducing the problem (with two appenders 
writing to the same file) reveals the following:


3.1. One thread holds a lock (0x00000000dfa5d600):


"Thread-48" #199 daemon prio=5 os_prio=0 tid=0x0000000024b61000 nid=0x772c 
runnable [0x000000002422e000]
   java.lang.Thread.State: RUNNABLE
    at java.util.zip.Deflater.deflateBytes(Native Method)
    at java.util.zip.Deflater.deflate(Deflater.java:442)
    - locked <0x00000000fdb8dcf0> (a java.util.zip.ZStreamRef)
    at java.util.zip.Deflater.deflate(Deflater.java:364)
    at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:251)
    at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
    at java.util.zip.ZipOutputStream.write(ZipOutputStream.java:331)
    - locked <0x00000000fdb8da00> (a java.util.zip.ZipOutputStream)
    at 
ch.qos.logback.core.rolling.helper.Compressor.zipCompress(Compressor.java:110)
    at 
ch.qos.logback.core.rolling.helper.Compressor.compress(Compressor.java:58)
    at 
ch.qos.logback.core.rolling.FixedWindowRollingPolicy.rollover(FixedWindowRollingPolicy.java:147)
    at 
ch.qos.logback.core.rolling.RollingFileAppender.rollover(RollingFileAppender.java:111)
    - locked <0x00000000dfa5d460> (a ch.qos.logback.core.spi.LogbackLock)
    at 
ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:144)
    - locked <0x00000000dfa5d600> (a 
ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy)
    at 
ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
    at 
ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
    at 
ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
    at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:280)
    at ch.qos.logback.classic.Logger.callAppenders(Logger.java:267)
    at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:449)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:403)
    at ch.qos.logback.classic.Logger.log(Logger.java:803)
    at 
org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(SLF4JLocationAwareLog.java:185)
    ...


3.2. A multitude of threads are blocked and waiting to acquire that lock, e.g.:


"Thread-55" #206 daemon prio=5 os_prio=0 tid=0x000000001f50d000 nid=0x7288 
waiting for monitor entry [0x00000000263ef000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at 
ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:142)
    - waiting to lock <0x00000000dfa5d600> (a 
ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy)
    at 
ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
    at 
ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
    at 
ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
    at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:280)
    at ch.qos.logback.classic.Logger.callAppenders(Logger.java:267)
    at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:449)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:403)
    at ch.qos.logback.classic.Logger.log(Logger.java:803)
    at 
org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(SLF4JLocationAwareLog.java:185)
    ...




I have attached the logging configuration from the Virgo server. It's called 
serviceability.xml. (That's the default configuration file name for the Virgo 
server.)

I'm sorry the mail turned out so long. I hope this will not discourage you from 
reading it. I will be extremely grateful if you, guys, can help us figure out 
what makes Logback go crazy.
<!--
  Virgo log configuration
  See http://www.eclipse.org/virgo/documentation/virgo-documentation-3.6.3.RELEASE/docs/virgo-user-guide/htmlsingle/virgo-user-guide.html#configuring-serviceability
-->
<configuration>

   <contextListener class="ch.qos.logback.classic.jul.LevelChangePropagator"/>

   <appender name="LOG_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
      <file>D:/logs/product.log</file>
      <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
         <FileNamePattern>D:/logs/product_%i.log.zip</FileNamePattern>
         <MinIndex>0</MinIndex>
         <MaxIndex>9</MaxIndex>
      </rollingPolicy>
      <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
         <MaxFileSize>50MB</MaxFileSize>
      </triggeringPolicy>
      <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
         <Pattern>[%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}] [%-5level] %-28.28thread %X{sessionGuid} %-64.64logger{64} %X{medic.eventCode} %msg %ex%n</Pattern>
      </encoder>
   </appender>

   <appender name="EVENT_LOG_STDOUT" class="org.eclipse.virgo.medic.log.logback.ReroutingAwareConsoleAppender">
      <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
         <Pattern>[%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}] [%-5level] %-28.28thread %X{sessionGuid} &lt;%X{medic.eventCode}&gt; %msg %ex%n</Pattern>
      </encoder>
   </appender>

   <!-- eventlog.log is a small log recording only important events in Virgo, i.e. bundle activities, etc. -->
   <appender name="EVENT_LOG_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
      <file>D:/logs/eventlog.log</file>
      <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
         <FileNamePattern>D:/logs/eventlog_%i.log.zip</FileNamePattern>
         <MinIndex>1</MinIndex>
         <MaxIndex>4</MaxIndex>
      </rollingPolicy>
      <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
         <MaxFileSize>10MB</MaxFileSize>
      </triggeringPolicy>
      <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
         <Pattern>[%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX}] [%-5level] %-28.28thread %X{sessionGuid} &lt;%X{medic.eventCode}&gt; %msg %ex%n</Pattern>
      </encoder>
   </appender>

   <!-- query_profile_dataservice.log is only used for debugging -->
   <appender name="DS_QUERY_PROFILE_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
      <file>sD:/logs/query_profile_dataservice.log</file>
      <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
         <FileNamePattern>D:/logs/query_profile_dataservice_%i.log.zip</FileNamePattern>
         <MinIndex>0</MinIndex>
         <MaxIndex>9</MaxIndex>
      </rollingPolicy>
      <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
         <MaxFileSize>50MB</MaxFileSize>
      </triggeringPolicy>
      <filter class="ch.qos.logback.core.filter.EvaluatorFilter">
         <evaluator class="ch.qos.logback.classic.boolex.OnMarkerEvaluator">
            <marker>DS_QUERY_PROFILE</marker>
         </evaluator>
         <onMismatch>DENY</onMismatch>
         <onMatch>NEUTRAL</onMatch>
      </filter>
      <encoder>
         <pattern>%msg%n@!@!@</pattern>
      </encoder>
   </appender>

   <root level="INFO">
      <appender-ref ref="LOG_FILE" />
   </root>

   <appender name="PRODUCTMESSAGING_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
      <file>D:/logs/productmessaging.log</file>
      <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
         <FileNamePattern>D:/logs/productmessaging_%i.log.zip</FileNamePattern>
         <MinIndex>0</MinIndex>
         <MaxIndex>9</MaxIndex>
      </rollingPolicy>
      <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
         <MaxFileSize>50MB</MaxFileSize>
      </triggeringPolicy>
      <encoder>
         <Pattern>[%d{yyyy-MM-dd HH:mm:ss.SSS}] [%-5level] %-28.28thread %msg %ex%n</Pattern>
      </encoder>
   </appender>

   <logger level="DEBUG" additivity="true" name="com.acme.messaging">
      <appender-ref ref="PRODUCTMESSAGING_FILE" />
   </logger>

   <logger level="INFO" additivity="true" name="com.acme.data.query.impl.DataServiceImpl">
      <appender-ref ref="DS_QUERY_PROFILE_FILE" />
   </logger>

   <logger level="INFO" additivity="false" name="org.eclipse.virgo.medic.eventlog.localized">
      <appender-ref ref="EVENT_LOG_STDOUT" />
      <appender-ref ref="EVENT_LOG_FILE" />
   </logger>

   <logger level="INFO" additivity="false" name="org.eclipse.virgo.medic.eventlog.default">
      <appender-ref ref="LOG_FILE" />
   </logger>

   <logger level="ERROR" additivity="false" name="com.acme.ph.client.common.extensions">
      <!-- only error messages from phonehome client extensions are necessary -->
      <appender-ref ref="LOG_FILE" />
   </logger>

</configuration>
_______________________________________________
logback-user mailing list
[email protected]
http://mailman.qos.ch/mailman/listinfo/logback-user

Reply via email to