Re: [Gluster-users] Gluster 3.6.3 performance.cache-size not working as expected in some cases

Raghavendra Bhat Wed, 02 Sep 2015 00:15:57 -0700


Hi Christian,

I have been working on it since couple of days. I have not been able torecreate the issue. I will continue to recreate and get back to you in aday or two.


Regards,
Raghavendra Bhat


On 09/02/2015 12:45 AM, Christian Rice wrote:

This is still an issue for me, I don’t need anyone to tear the codeapart, but I’d be grateful if someone would even chime in and say“yeah, we’ve seen that too."
From: Christian Rice <[email protected] <mailto:[email protected]>>
Date: Sunday, August 30, 2015 at 11:18 PM
To: "[email protected] <mailto:[email protected]>"<[email protected] <mailto:[email protected]>>Subject: [Gluster-users] Gluster 3.6.3 performance.cache-size notworking as expected in some cases
I am confused about my caching problem. I’ll try to keep this asstraightforward as possible and include the basic details...
I have a sixteen node distributed volume, one brick per node, XFSisize=512, Debian 7/Wheezy, 32GB RAM minimally. Every brick node isalso a gluster client, and also importantly an HTTP server. We use aback-end 1GbE network for gluster traffic (eth1). There are a coupledozen gluster client-only systems accessing this volume, as well.
We had a really hot spot on one brick due to an oft-requested file,and every time any httpd process on any gluster client was asked todeliver the file, it was physically fetching it (we could see thistraffic using, say, ‘iftop -i eth1’,) so we thought to increase thevolume cache timeout and cache size. We set the following values fortesting:
performance.cache-size 16GB
performance.cache-refresh-timeout: 30
This test was run from a node that didn’t have the requested file onthe local brick:
while(true); do cat /path/to/file > /dev/null; done
and what had been very high traffic on the gluster backend network,delivering the data repeatedly to my requesting node, dropped tonothing visible.
I thought good, problem fixed. Caching works. My colleague had run atest early on to show this perf issue, so he ran it again to sign off.
His testing used curl, because all the real front end traffic is HTTP,and all the gluster nodes are web servers, which are of course usingthe fuse mount to access the document root. Even with our performancetuning, the traffic on the gluster backend subnet was continuous andundiminished. I saw no evidence of cache (again using ‘iftop -ieth1’, which showed a steady 75+% of line rate on a 1GbE link.
Does that make sense at all? We had theorized that we wouldn’t get touse VFS/kernel page cache on any node except maybe the one which heldthe data in the local brick. That’s what drove us to setting thegluster performance cache. But it doesn’t seem to come into play withhttp access.
Volume info:
Volume Name: DOCROOT
Type: Distribute
Volume ID: 3aecd277-4d26-44cd-879d-cffbb1fec6ba
Status: Started
Number of Bricks: 16
Transport-type: tcp
Bricks:
<snipped list of bricks>
Options Reconfigured:
performance.cache-refresh-timeout: 30
performance.cache-size: 16GB
The net result of being overwhelmed by a hot spot is all the glusterclient nodes lose access to the gluster volume—it becomes so busy ithangs. When the traffic goes away (failing health checks by loadbalancers causes requests to be redirected elsewhere), the volumeeventually unfreezes and life goes on.
I wish I could type ALL that into a google query and get a lucid answer :)

Regards,
Christian


_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.6.3 performance.cache-size not working as expected in some cases

Reply via email to