[ 
https://issues.apache.org/jira/browse/TS-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Carlin updated TS-1316:
-----------------------------

    Description: 
We had a pair of ATS 3.2.0 boxes that stopped passing traffic simultaneously.  
Here are the traffic.out msgs we saw on both boxes:

[Jun 22 05:53:27.637] Server {0x2b6b6d9da700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 36, purging...
[Jun 22 05:56:05.542] Server {0x2b6b6d2d3700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 85, purging...
[Jun 22 05:56:07.434] Server {0x2b6b6d4d5700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 71, purging...
[Jun 22 05:58:24.743] Server {0x2b6b6d8d9700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 33, purging...

Those messages went on for a couple minutes, then traffic apparently ceased - 
our monitoring system saw connection refused for port 80 on ATS from then on. 
The connection refused state went on for many hours until ATS was restarted.  
There were no traffic_cop msgs in /var/log/messages indicating that the 
heartbeat failed.

Here are the relevant ATS settings/stats:

proxy.process.cache.bytes_total = 190690320384
proxy.process.cache.direntries.total = 5817752
proxy.config.cache.min_average_object_size = 32768

We previously came up with proxy.config.cache.min_average_object_size by 
waiting for the cache to fill and dividing proxy.process.cache.bytes_used by 
proxy.process.cache.direntries.used - which equals about 34KB.

We're assuming ATS ran out of direntries and it didn't handle this situation 
gracefully.  As a possible workaround, I'm going to lower 
proxy.config.cache.min_average_object_size to 24KB.

Thanks to Bryan Call for helping me troubleshoot this!

  was:
We had a pair of ATS 3.2.0 boxes that stopped passing traffic simultaneously.  
Here are the traffic.out msgs we saw on both boxes:

[Jun 22 05:53:27.637] Server {0x2b6b6d9da700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 36, purging...
[Jun 22 05:56:05.542] Server {0x2b6b6d2d3700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 85, purging...
[Jun 22 05:56:07.434] Server {0x2b6b6d4d5700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 71, purging...
[Jun 22 05:58:24.743] Server {0x2b6b6d8d9700} WARNING: cache directory overflow 
on '/dev/dm-4' segment 33, purging...

Those messages went on for a couple minutes, then traffic apparently ceased - 
our monitoring system saw connection refused for port 80 on ATS from then on. 
The connection refused state went on for many hours until ATS was restarted.  
There were no traffic_cop msgs in /var/log/messages indicating that the 
heartbeat failed.

Here are the relevant ATS settings/stats:

proxy.process.cache.bytes_total = 190690320384
proxy.process.cache.direntries.total = 5817752
proxy.config.cache.min_average_object_size = 32768

We previously came up with proxy.config.cache.min_average_object_size by 
waiting for the cache to fill and dividing proxy.process.cache.bytes_used by 
proxy.process.cache.direntries.used - which equals about 34KB.

We're assuming ATS ran out of direntries and it didn't handle this situation 
gracefully.  As a workaround, I'm going to lower 
proxy.process.cache.direntries.used to 24KB.

Thanks to Bryan Call for helping me troubleshoot this!

    
> ATS connection refused once cache direntries exhausted
> ------------------------------------------------------
>
>                 Key: TS-1316
>                 URL: https://issues.apache.org/jira/browse/TS-1316
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.2.0
>         Environment: RHEL 6.2 x86_64
>            Reporter: David Carlin
>
> We had a pair of ATS 3.2.0 boxes that stopped passing traffic simultaneously. 
>  Here are the traffic.out msgs we saw on both boxes:
> [Jun 22 05:53:27.637] Server {0x2b6b6d9da700} WARNING: cache directory 
> overflow on '/dev/dm-4' segment 36, purging...
> [Jun 22 05:56:05.542] Server {0x2b6b6d2d3700} WARNING: cache directory 
> overflow on '/dev/dm-4' segment 85, purging...
> [Jun 22 05:56:07.434] Server {0x2b6b6d4d5700} WARNING: cache directory 
> overflow on '/dev/dm-4' segment 71, purging...
> [Jun 22 05:58:24.743] Server {0x2b6b6d8d9700} WARNING: cache directory 
> overflow on '/dev/dm-4' segment 33, purging...
> Those messages went on for a couple minutes, then traffic apparently ceased - 
> our monitoring system saw connection refused for port 80 on ATS from then on. 
> The connection refused state went on for many hours until ATS was restarted.  
> There were no traffic_cop msgs in /var/log/messages indicating that the 
> heartbeat failed.
> Here are the relevant ATS settings/stats:
> proxy.process.cache.bytes_total = 190690320384
> proxy.process.cache.direntries.total = 5817752
> proxy.config.cache.min_average_object_size = 32768
> We previously came up with proxy.config.cache.min_average_object_size by 
> waiting for the cache to fill and dividing proxy.process.cache.bytes_used by 
> proxy.process.cache.direntries.used - which equals about 34KB.
> We're assuming ATS ran out of direntries and it didn't handle this situation 
> gracefully.  As a possible workaround, I'm going to lower 
> proxy.config.cache.min_average_object_size to 24KB.
> Thanks to Bryan Call for helping me troubleshoot this!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to