Charles Loots created LOGBACK-1402:
--------------------------------------

             Summary: SocketAppender not detecting dirty TCP disconnect. Log 
events go missing.
                 Key: LOGBACK-1402
                 URL: https://jira.qos.ch/browse/LOGBACK-1402
             Project: logback
          Issue Type: Bug
          Components: logback-core
    Affects Versions: 1.2.3
         Environment: I have set up several processes instances, each using 
logback. The instances are meant to be containerized in docker and scale out. 
The instances (senders) use SocketAppender to centralize log events to a 
different container, where a process (receiver) runs with logback with a 
ServerSocketReceiver. At this single instance i am able to further deal with 
the log events in a central place.

The setup works perfectly with one exception. When the receiver container is 
shut down, the senders are not aware of the disconnect and take about 10 
minutes to become aware of the problem with the socket, despite nearly constant 
logging taking place. Only at this point to they reconnect.

 
            Reporter: Charles Loots
            Assignee: Logback dev list


It would appear that a remote container shutdown results in a dirty TCP 
disconnect. The remote TCP server disappears abruptly, leaving the client side 
with the socket in CLOSE_WAIT state. 

While this is the case the client side is happily appending events to the 
SocketAppender which in turn seems to be writing this to the socket (the Send-Q 
on the socket keeps growing) without detecting any errors.

This keeps going for several minutes (between 7 and 12 in my tests) during 
which log events go missing. Only at this very late point is an error raised 
and the reconnect logic triggered.

 

I found that the *AbstractLogstashTcpSocketAppender* from the logstash project 
encountered and fixed this same problem.

[https://github.com/logstash/logstash-logback-encoder/issues/76]

[https://github.com/logstash/logstash-logback-encoder/issues/85]

 

They effectively fixed the problem by attempting a read from the socket (with 
short read timeout) which results in a much more timely detection of the 
disconnect . This could take place on a separate watchdog thread to detect the 
problem.

[https://github.com/logstash/logstash-logback-encoder/blob/master/src/main/java/net/logstash/logback/appender/AbstractLogstashTcpSocketAppender.java#L385]

The call() method tries to constantly read from the socket. The expected 
outcome is that -1 bytes were read within the read timeout period, and this is 
where an exception will be much more reliably triggered when the socket is no 
longer available.

 

 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.3.1#73012)
_______________________________________________
logback-dev mailing list
logback-dev@qos.ch
http://mailman.qos.ch/mailman/listinfo/logback-dev

Reply via email to