Hi,

I think I found the root cause.
We had an https delivery service with no ssl-keys.
As a result the router was not able to fully load the configuration.

The bad delivery service was deleted and cr-config was snapshot again.
Now however the router was stuck, still trying to get the old certificate.
Therefore it kept on rejecting the new config, ad had the below messages.

Can it be the reason for the connection towards the monitor not being
closed properly?

Nir

INFO  2018-02-14T07:33:33.020 [New I/O worker #2]
com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler -
Waiting for https certificates to support new config 5cb1f62b
INFO  2018-02-14T07:33:34.021 [New I/O worker #2]
com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler -
Waiting for https certificates to support new config 5cb1f62b
INFO  2018-02-14T07:33:34.990 [pool-5-thread-1]
com.comcast.cdn.traffic_control.traffic_router.core.monitor.TrafficMonitorWatcher
- Loading properties from
/opt/traffic_router/conf/traffic_monitor.properties
INFO  2018-02-14T07:33:34.994 [New I/O worker #3]
com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler -
Entered processConfig
INFO  2018-02-14T07:33:35.021 [New I/O worker #2]
com.comcast.cdn.traffic_control.traffic_router.core.config.ConfigHandler -
Exiting processConfig: processing of config with timestamp Wed Feb 14
07:31:46 UTC 2018 was cancelled
WARN  2018-02-14T07:33:35.021 [New I/O worker #2]
com.comcast.cdn.traffic_control.traffic_router.core.util.PeriodicResourceUpdater
- File rejected: /opt/traffic_router/db/cr-config.json


On Wed, Feb 14, 2018 at 9:51 PM, Nir Sopher <n...@qwilt.com> wrote:

> Hi,
>
> I implemented the fix and issue was resolved
> until today:)
>
> I have 2 routers, both got stuck together due to connections leak, with
> "CLOSE_WAIT" connection towards the monitors.
> The only messages in catalina.out were:
> WARNING: Imported handshake data with alias <DS>
> Feb 13, 2018 2:04:49 PM com.comcast.cdn.traffic_
> control.traffic_router.secure.CertificateRegistry
> importCertificateDataList
>
> Can it be that in some rare, probably failing, situations, the monitor
> does not close the connection?
> Nir
>
> On Thu, Feb 1, 2018 at 11:27 PM, Nir Sopher <n...@qwilt.com> wrote:
>
>> Great,
>> Thanks!
>> Nir
>>
>> On Thu, Feb 1, 2018 at 11:12 PM, Jeffrey Martin <martin.jef...@gmail.com>
>> wrote:
>>
>>> Hi Nir,
>>>    This issue is defined by:
>>>
>>>  Jira: https://issues.apache.org/jira/browse/TC-197
>>> and Github https://github.com/apache/incubator-trafficcontrol/issues/916
>>>
>>> I will be working on a pull request to address this issue in 2.2. The
>>> work
>>> around is in the second link above.
>>> Jeff
>>>
>>>
>>> On Thu, Feb 1, 2018 at 4:09 PM, Jeffrey Martin <martin.jef...@gmail.com>
>>> wrote:
>>>
>>> > Hi Nir,
>>> >
>>> >
>>> > On Thu, Feb 1, 2018 at 4:01 PM, Nir Sopher <n...@qwilt.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> One of my routers got stuck today, not being able to answer http
>>> requests
>>> >> (routing and API).
>>> >> When trying to investigate the issue, I found catalina.log with a lot
>>> of
>>> >> messages complaining on failure to open a socket due to too many open
>>> >> files. See example below.
>>> >> No issues were found in the log earlier to that point, beyond a
>>> periodic
>>> >> warnings of pulling the certificates every 5 minutes.
>>> >>
>>> >> When trying to understand "what are these open files", I found about
>>> 4k
>>> >> open connections in "CLOSE_WAIT" towards the monitor.
>>> >> Note: I'm running TC2.1 RC3 with golang traffic-monitor.
>>> >>
>>> >> Have anyone encountered a similar issue?
>>> >> Are the warnings for pulling the certificates a normal thing?
>>> >>
>>> >> Thanks,
>>> >> Nir
>>> >>
>>> >> Feb 01, 2018 7:33:09 AM
>>> >> com.comcast.cdn.traffic_control.traffic_router.secure.Certif
>>> icateRegistry
>>> >> importCertificateDataList
>>> >> WARNING: Imported handshake data with alias my-ds.my-cdn.com
>>> >> Feb 01, 2018 8:43:13 AM org.apache.tomcat.util.net.Nio
>>> Endpoint$Acceptor
>>> >> run
>>> >> SEVERE: Socket accept failed
>>> >> java.io.IOException: Too many open files
>>> >>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>> >>         at
>>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne
>>> >> lImpl.java:422)
>>> >>         at
>>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne
>>> >> lImpl.java:250)
>>> >>         at
>>> >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo
>>> >> int.java:1309)
>>> >>         at java.lang.Thread.run(Thread.java:745)
>>> >>
>>> >> Feb 01, 2018 8:43:14 AM org.apache.tomcat.util.net.Nio
>>> Endpoint$Acceptor
>>> >> run
>>> >> SEVERE: Socket accept failed
>>> >> java.io.IOException: Too many open files
>>> >>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>> >>         at
>>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne
>>> >> lImpl.java:422)
>>> >>         at
>>> >> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChanne
>>> >> lImpl.java:250)
>>> >>         at
>>> >> org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpo
>>> >> int.java:1309)
>>> >>         at java.lang.Thread.run(Thread.java:745)
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Reply via email to