[jira] [Comment Edited] (LOG4J2-2926) Application OUTAGE due to Unable to write to stream TCP

Ralph Goers (Jira) Fri, 05 Mar 2021 09:06:06 -0800


    [ 
https://issues.apache.org/jira/browse/LOG4J2-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296183#comment-17296183
 ]


Ralph Goers edited comment on LOG4J2-2926 at 3/5/21, 5:05 PM:
--------------------------------------------------------------

Actually, the BerkleyDB change is directly related to this problem.  I had a 
similar problem the other day where we had network issues preventing new 
connections to Logstash in our dev environment. It turns out the existing 
connections the app had to Logstash didn't fail but if they had we would have 
encountered this exact problem. Any logs generated during the network outage 
would have been lost. Yes, you can use a Failover appender but then when things 
recover you have to somehow get those logs into the ELK stack.  It makes more 
sense to handle that automatically. But you are correct that perhaps it could 
be done more generically so as to wrap any appender so that logs are written to 
disk when a failure occurs and when things recover the accumulated logs are 
retrieved and written. 

I should point out that using ActiveMQ doesn't really solve the problem as 
ActiveMQ doesn't buffer messages in the client. They will fail to send just 
like the SocketAppender did. The messages have to be able to be buffered in the 
application to be able to prevent loss of data.

The point is this feature is exactly what the reporter is asking for.


was (Author: [email protected]):
Actually, the BerkleyDB change is directly related to this problem.  I had a 
similar problem the other day where we had network issues preventing new 
connections to Logstash in our dev environment. It turns out the existing 
connections the app had to Logstash didn't fail but if they had we would have 
encountered this exact problem. Any logs generated during the network outage 
would have been lost. Yes, you can use a Failover appender but then when things 
recover you have to somehow get those logs into the ELK stack.  It makes more 
sense to handle that automatically. But you are correct that perhaps it could 
be done more generically so as to wrap any appender so that logs are written to 
disk when a failure occurs and when things recover the accumulated logs are 
retrieved and written. 

The point is this feature is exactly what the reporter is asking for.

> Application OUTAGE due to Unable to write to stream TCP
> -------------------------------------------------------
>
>                 Key: LOG4J2-2926
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-2926
>             Project: Log4j 2
>          Issue Type: Bug
>          Components: Appenders
>    Affects Versions: 2.13.3
>         Environment: Mulesoft, Linux, ELK (hosted service on AWS)
>            Reporter: Kaushik Vankayala
>            Assignee: Ralph Goers
>            Priority: Major
>              Labels: SocketAppender, beginner
>             Fix For: 2.13.3
>
>
> Hi Team, we have recently encountered an outage in our PRODUCTION 
> application. We have custom logging using log4j2 and the remote server was 
> out of storage. We suspect we got the issue because of the same reason and 
> the ERROR we faced is as below;
>  
> 2020-08-30 22:23:04,686 Log4j2-TF-17-AsyncLoggerConfig-9 ERROR Unable to 
> write to stream 
> TCP:[api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/]:8500
>  for appender SOCKET 
> org.apache.logging.log4j.core.appender.AppenderLoggingException: Error 
> sending to 
> TCP:[api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/]:8500
>  for 
> [api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/52.221.23.118:8500|http://api-manager-2623b9734249246e.elb.ap-southeast-1.amazonaws.com/52.221.23.118:8500]
>  at 
> org.apache.logging.log4j.core.net.TcpSocketManager.write(TcpSocketManager.java:231)
>  at 
> org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:190)
>  at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.writeByteArrayToManager(AbstractOutputStreamAppender.java:206)
>  at 
> org.apache.logging.log4j.core.appender.SocketAppender.directEncodeEvent(SocketAppender.java:459)
>  at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
>  "http.listener.02 SelectorRunner" #76 prio=5 os_prio=0 
> tid=0x00007f314c52d800 nid=0xb19 waiting for monitor entry 
> [0x00007f314a6fc000] java.lang.Thread.State: BLOCKED (on object monitor) at 
> org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueue(AsyncLoggerConfigDisruptor.java:376)
>  - waiting to lock <0x0000000088b43a58> (a java.lang.Object) at 
> org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueueEvent(AsyncLoggerConfigDisruptor.java:330)
>  at 
> org.apache.logging.log4j.core.async.AsyncLoggerConfig.logInBackgroundThread(AsyncLoggerConfig.java:159)
>  at 
> org.apache.logging.log4j.core.async.EventRoute$1.logMessage(EventRoute.java:46)
>  
> We tried to follow the link 
> ([https://help.mulesoft.com/s/article/Mule-instance-which-implements-a-log4j2-SocketAppender-complains-with-Broken-Pipe-Error]).
>  
> Unlike splunk we have ELK in our architecture. Our Socket appender looks like 
> below
>  
> {{<Socket name="SOCKET" host="${sys:tcp.host}" port="${sys:tcp.port}" 
> reconnectDelayMillis="30000" immediateFail="false" bufferedIo="true" 
> bufferSize="204800" protocol="TCP" immediateFlush="false">}}
>  
> We have couple of queries below if you could kindly address them;
>  # With the current Socket Appender what additional tags may be needed to 
> independently stream the logs irrestive of the remote destination status 
>  # Our ELK server is a hosted servie. The first point after Cloudhub is a 
> Load Balancer after which there is an EC2 server where Logstash is running. 
> Do we need to configure any keep-alive configuration at the O/S level?
>  # Why should a storage issue at a remote destination cause an issue in the 
> socket appender and eventually fail the running of an application. Logging by 
> socket appender should ideally be an independent activithy.
> Finally, we would request you to recommend a solution for the case where the 
> remote endpoint storage is exhausted or there may be any TCP sockets dead, 
> and how we can avoid the OUTAGE of MuleSoft application due to a logging 
> problem by Log4j2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (LOG4J2-2926) Application OUTAGE due to Unable to write to stream TCP

Reply via email to