Re: kswapd0 causing read timeouts

Gurpreet Singh Fri, 08 Jun 2012 22:02:53 -0700

Aaron, Ruslan,
I changed the disk access mode to mmap_index_only, and it has been stable
ever since, well at least for the past 20 hours. Previously, in abt 10-12
hours, as soon as the resident memory was full, the client would start
timing out on all its reads. It looks fine for now, i am going to let it
continue to see how long it lasts and if the problem comes again.

Aaron,
yes, i had turned swap off.

The total cpu utilization was at 700% roughly.. It looked like kswapd0 was
using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a
bit. top was reporting high system cpu, and low user cpu.
vmstat was not showing swapping. java heap size max is 8 gigs. while only 4
gigs was in use, so java heap was doing great. no gc in the logs. iostat
was doing ok from what i remember, i will have to reproduce the issue for
the exact numbers.

cfstats latency had gone very high, but that is partly due to high cpu
usage.

One thing was clear, that the SHR was inching higher (due to the mmap)
while buffer cache which started at abt 20-25mb reduced to 2 MB by the end,
which probably means that pagecache was being evicted by the kswapd0. Is
there a way to fix the size of the buffer cache and not let system evict it
in favour of mmap?

Also, mmapping data files would basically cause not only the data (asked
for) to be read into main memory, but also a bunch of extra pages
(readahead), which would not be very useful, right? The same thing for
index would actually be more useful, as there would be more index entries
in the readahead part.. and the index files being small wouldnt cause
memory pressure that page cache would be evicted. mmapping the data files
would make sense if the data size is smaller than the RAM or the hot data
set is smaller than the RAM, otherwise just the index would probably be a
better thing to mmap, no?. In my case data size is 85 gigs, while available
RAM is 16 gigs (only 8 gigs after heap).

/G

On Fri, Jun 8, 2012 at 11:44 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Ruslan,
> Why did you suggest changing the disk_access_mode ?
>
> Gurpreet,
> I would leave the disk_access_mode with the default until you have a
> reason to change it.
>
> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>
> is swap disabled ?
>
> Gradually,
>> > the system cpu becomes high almost 70%, and the client starts getting
>> > continuous timeouts
>>
> 70% of one core or 70% of all cores ?
> Check the server logs, is there GC activity ?
> check nodetool cfstats to see the read latency for the cf.
>
> Take a look at vmstat to see if you are swapping, and look at iostats to
> see if io is the problem
> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote:
>
> Thanks Ruslan.
> I will try the mmap_index_only.
> Is there any guideline as to when to leave it to auto and when to use
> mmap_index_only?
>
> /G
>
> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov <ruslan.usi...@gmail.com>wrote:
>
>> disk_access_mode: mmap??
>>
>> set to disk_access_mode: mmap_index_only in cassandra yaml
>>
>> 2012/6/8 Gurpreet Singh <gurpreet.si...@gmail.com>:
>> > Hi,
>> > I am testing cassandra 1.1 on a 1 node cluster.
>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>> >
>> > cassandra 1.1.1
>> > heap size: 8 gigs
>> > key cache size in mb: 800 (used only 200mb till now)
>> > memtable_total_space_in_mb : 2048
>> >
>> > I am running a read workload.. about 30 reads/second. no writes at all.
>> > The system runs fine for roughly 12 hours.
>> >
>> > jconsole shows that my heap size has hardly touched 4 gigs.
>> > top shows -
>> >   SHR increasing slowly from 100 mb to 6.6 gigs in  these 12 hrs
>> >   RES increases slowly from 6 gigs all the way to 15 gigs
>> >   buffers are at a healthy 25 mb at some point and that goes down to 2
>> mb in
>> > these 12 hrs
>> >   VIRT stays at 85 gigs
>> >
>> > I understand that SHR goes up because of mmap, RES goes up because it is
>> > showing SHR value as well.
>> >
>> > After around 10-12 hrs, the cpu utilization of the system starts
>> increasing,
>> > and i notice that kswapd0 process starts becoming more active.
>> Gradually,
>> > the system cpu becomes high almost 70%, and the client starts getting
>> > continuous timeouts. The fact that the buffers went down from 20 mb to
>> 2 mb
>> > suggests that kswapd0 is probably swapping out the pagecache.
>> >
>> > Is there a way out of this to avoid the kswapd0 starting to do things
>> even
>> > when there is no swap configured?
>> > This is very easily reproducible for me, and would like a way out of
>> this
>> > situation. Do i need to adjust vm memory management stuff like
>> pagecache,
>> > vfs_cache_pressure.. things like that?
>> >
>> > just some extra information, jna is installed, mlockall is successful.
>> there
>> > is no compaction running.
>> > would appreciate any help on this.
>> > Thanks
>> > Gurpreet
>> >
>> >
>>
>
>
>

Re: kswapd0 causing read timeouts

Reply via email to