[
https://issues.apache.org/jira/browse/TS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200197#comment-15200197
]
Alan M. Carroll commented on TS-4279:
-------------------------------------
Yes. Having a free list of length 371 out of 875360 is a very full situation
and you need to set your average object size down at least by half, possibly
more. The only cost of making that value smaller is a larger memory footprint
which you'll want to keep an eye on. Do note that changing the value will
invalidate the cache contents. I suspect the crash is from bad interactions
with the HTTP state machine when the cache is full and is unlikely to be fixed
any time soon unfortunately. I am working on other cache fixes and I may
eventually be able to look at this or do a better job of reclaiming. Currently
it does quite a poor job of it (essentially decimating the first doc entries in
a segment, which is not very successful if there are lots of multi-fragment
objects or alternates). What might do better is to pretend to write to a large
chunk of the stripe and use the reclaim logic for that to clear space.
> ats fallen into dead loop for cache directory overflow
> ------------------------------------------------------
>
> Key: TS-4279
> URL: https://issues.apache.org/jira/browse/TS-4279
> Project: Traffic Server
> Issue Type: Bug
> Components: Cache
> Affects Versions: 5.3.1
> Reporter: taoyunxing
> Fix For: 6.2.0
>
>
> CPU: 40 cores, Mem: 120GB, Disk: 1*300GB sys + 11 * 899GB data(naked), OS:
> CentOS 6.6, ATS: 5.3.1
> records.config:
> CONFIG proxy.config.cache.min_average_object_size INT 1048576
> CONFIG proxy.config.cache.ram_cache.algorithm INT 1
> CONFIG proxy.config.cache.ram_cache_cutoff INT 4194304
> CONFIG proxy.config.cache.ram_cache.size INT 64424509440
> storage.config:
> /dev/sdc id=cache.disk.1
> I encountered a kind of dead loop situation of ats 5.3.1 on two production
> hosts, a burst of warning is seen by me in the diags.log for a long time like
> this:
> {code}
> [Mar 16 13:04:32.730] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.732] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.733] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.735] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.737] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.739] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.742] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.744] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.747] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.750] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.753] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> [Mar 16 13:04:32.756] Server {0x2b8ffc544700} WARNING: <CacheDir.cc:502
> (freelist_clean)> cache directory overflow on '/dev/sdc' segment 4, purging...
> {code}
> ats restart in every serval hours, and the TIMEWAIT count is huge above the
> ESTABLISH TCP connection count.
> the following is the current dir snapshot of disk /dev/sdc on one of the
> hosts:
> {code}
> Directory for [cache.disk.1 172032:109741163]
> Bytes: 8573600
> Segments: 14
> Buckets: 15310
> Entries: 857360
> Full: 852904
> Empty: 4085
> Stale: 0
> Free: 371
> Bucket Fullness: 4085 15800 32044 41621
> 42175 33137 22232 12605
>
> Segment Fullness: 60903 60918 60914 60947 60956
> 60947 60872 60943 60918 60927
> 60858 60917 60927 60957
> Freelist Fullness: 45 30 27 13 0
> 7 89 5 32 12
> 83 0 20 8
> {code}
> I wonder why the value of freelist[4] is zero, which cause ats dead loop,
> anyone help me? thinks a lot.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)