Hi, I implemented the fix and issue was resolved until today:)
I have 2 routers, both got stuck together due to connections leak, with "CLOSE_WAIT" connection towards the monitors. The only messages in catalina.out were: WARNING: Imported handshake data with alias <DS> Feb 13, 2018 2:04:49 PM com.comcast.cdn.traffic_control.traffic_router.secure.CertificateRegistry importCertificateDataList Can it be that in some rare, probably failing, situations, the monitor does not close the connection? Nir On Thu, Feb 1, 2018 at 11:27 PM, Nir Sopher <[email protected]> wrote: > Great, > Thanks! > Nir > > On Thu, Feb 1, 2018 at 11:12 PM, Jeffrey Martin <[email protected]> > wrote: > >> Hi Nir, >> This issue is defined by: >> >> Jira: https://issues.apache.org/jira/browse/TC-197 >> and Github https://github.com/apache/incubator-trafficcontrol/issues/916 >> >> I will be working on a pull request to address this issue in 2.2. The work >> around is in the second link above. >> Jeff >> >> >> On Thu, Feb 1, 2018 at 4:09 PM, Jeffrey Martin <[email protected]> >> wrote: >> >> > Hi Nir, >> > >> > >> > On Thu, Feb 1, 2018 at 4:01 PM, Nir Sopher <[email protected]> wrote: >> > >> >> Hi, >> >> >> >> One of my routers got stuck today, not being able to answer http >> requests >> >> (routing and API). >> >> When trying to investigate the issue, I found catalina.log with a lot >> of >> >> messages complaining on failure to open a socket due to too many open >> >> files. See example below. >> >> No issues were found in the log earlier to that point, beyond a >> periodic >> >> warnings of pulling the certificates every 5 minutes. >> >> >> >> When trying to understand "what are these open files", I found about 4k >> >> open connections in "CLOSE_WAIT" towards the monitor. >> >> Note: I'm running TC2.1 RC3 with golang traffic-monitor. >> >> >> >> Have anyone encountered a similar issue? >> >> Are the warnings for pulling the certificates a normal thing? >> >> >> >> Thanks, >> >> Nir >> >> >> >> Feb 01, 2018 7:33:09 AM >> >> com.comcast.cdn.traffic_control.traffic_router.secure.Certif >> icateRegistry >> >> importCertificateDataList >> >> WARNING: Imported handshake data with alias my-ds.my-cdn.com >> >> Feb 01, 2018 8:43:13 AM org.apache.tomcat.util.net.Nio >> Endpoint$Acceptor >> >> run >> >> SEVERE: Socket accept failed >> >> java.io.IOException: Too many open files >> >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:422) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:250) >> >> at >> >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo >> >> int.java:1309) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> Feb 01, 2018 8:43:14 AM org.apache.tomcat.util.net.Nio >> Endpoint$Acceptor >> >> run >> >> SEVERE: Socket accept failed >> >> java.io.IOException: Too many open files >> >> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:422) >> >> at >> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne >> >> lImpl.java:250) >> >> at >> >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo >> >> int.java:1309) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> > >> > >> > >
