ResilientOutputStreamBase does not recover from transient network outage when
the appender uses the LayoutWrappingEncoder
-------------------------------------------------------------------------------------------------------------------------
Key: LBCORE-246
URL: http://jira.qos.ch/browse/LBCORE-246
Project: logback-core
Issue Type: Bug
Components: Layout
Affects Versions: 0.9.30
Environment: Windows
Reporter: Thomas P Rees
Assignee: Logback dev list
Our customer has a problem with network glitches, so we upgraded from 0.9.17 to
0.9.30 to take advantage of the ResilientOutputStream. The particular appender
that caused problems at the customer site used a custom layout class, so we
wrapped that class in a LayoutWrappingEncoder in build.xml:
<appender name="AUDIT"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<File>${logging.audit.root}/audit/audit.log</File>
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="com.chiliad.log.util.PropertiesLayout" />
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!-- stuff deleted here -->
</rollingPolicy>
</appender>
We tested the ResilientOutputStream by editing logback.xml to write the logfile
to another system, then we removed the ethernet cable from that other system. A
couple minutes later we plugged the ethernet cable back into the other system.
Stdout of the logging system reported
2012/02/14 16:33:47 | INFO | jvm 1 | srvmain | 16:33:47,414 |-ERROR in
c.q.l.c.recovery.ResilientFileOutputStream@12297237 - IO failure while writing
to file [\\cushman\e\logs\audit\audit.log] java.io.IOException: The specified
network name is no longer available
There was also a long stack trace from Java with more detail about the
IOException.
However immediately after this stack trace the following message appeared
2012/02/14 16:33:47 | INFO | jvm 1 | srvmain | 16:33:47,408 |-INFO in
c.q.l.c.recovery.ResilientFileOutputStream@12297237 - Recovered from IO failure
on file [\\cushman\e\logs\audit\audit.log]
Furthermore, logging did not actually recover. The audit log on the remote
server did not get any more messages, and stdout continued to record the
messaages shown above, always paired with the complaint about the IO failure
and then saying that it recovered from the IO failure.
I downloaded the source code and found the source of the problem. The
LayoutWrappingEncoder always calls write and then flush together:
outputStream.write(convertToBytes(txt));
outputStream.flush();
In ResilientOutputStreamBase the write method checks for a bad state in the
output stream, but the flush method does not:
public void write(int b) {
if (isPresumedInError()) {
if (!recoveryCoordinator.isTooSoon()) {
attemptRecovery();
}
return; // return regardless of the success of the recovery attempt
}
try {
os.write(b);
postSuccessfulWrite();
} catch (IOException e) {
postIOFailure(e);
}
}
public void flush() {
if (os != null) {
try {
os.flush();
postSuccessfulWrite();
} catch (IOException e) {
postIOFailure(e);
}
}
}
So, the write method encounters the IOException and sets the presumedClean flag
to false. However, the flush method does not call isPresumedInError and the
flush succeeds (I presume because there is no data to flush). The flush method
then calls postSuccessfulWrite, which sets the presumedClean flag to true.
I fixed this problem by changing the flush method to call isPresumedInError::
public void flush() {
if (os != null) {
if (isPresumedInError()) {
if (!recoveryCoordinator.isTooSoon()) {
attemptRecovery();
}
return; // return regardless of the success of the recovery attempt
}
try {
os.flush();
postSuccessfulWrite();
} catch (IOException e) {
postIOFailure(e);
}
}
}
Now when we run the test of unplugging the ethernet cable on the system where
the audit logs are written and then plug the cable back in the logger starts
logging again and the logfile resumes recording messages.
Does this solution look correct to you?
thanks
Tom
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.qos.ch/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
_______________________________________________
logback-dev mailing list
[email protected]
http://mailman.qos.ch/mailman/listinfo/logback-dev