Hi Christian,
I have been working on it since couple of days. I have not been able to
recreate the issue. I will continue to recreate and get back to you in a
day or two.
Regards,
Raghavendra Bhat
On 09/02/2015 12:45 AM, Christian Rice wrote:
This is still an issue for me, I don’t need anyone to tear the code
apart, but I’d be grateful if someone would even chime in and say
“yeah, we’ve seen that too."
From: Christian Rice <[email protected] <mailto:[email protected]>>
Date: Sunday, August 30, 2015 at 11:18 PM
To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Subject: [Gluster-users] Gluster 3.6.3 performance.cache-size not
working as expected in some cases
I am confused about my caching problem. I’ll try to keep this as
straightforward as possible and include the basic details...
I have a sixteen node distributed volume, one brick per node, XFS
isize=512, Debian 7/Wheezy, 32GB RAM minimally. Every brick node is
also a gluster client, and also importantly an HTTP server. We use a
back-end 1GbE network for gluster traffic (eth1). There are a couple
dozen gluster client-only systems accessing this volume, as well.
We had a really hot spot on one brick due to an oft-requested file,
and every time any httpd process on any gluster client was asked to
deliver the file, it was physically fetching it (we could see this
traffic using, say, ‘iftop -i eth1’,) so we thought to increase the
volume cache timeout and cache size. We set the following values for
testing:
performance.cache-size 16GB
performance.cache-refresh-timeout: 30
This test was run from a node that didn’t have the requested file on
the local brick:
while(true); do cat /path/to/file > /dev/null; done
and what had been very high traffic on the gluster backend network,
delivering the data repeatedly to my requesting node, dropped to
nothing visible.
I thought good, problem fixed. Caching works. My colleague had run a
test early on to show this perf issue, so he ran it again to sign off.
His testing used curl, because all the real front end traffic is HTTP,
and all the gluster nodes are web servers, which are of course using
the fuse mount to access the document root. Even with our performance
tuning, the traffic on the gluster backend subnet was continuous and
undiminished. I saw no evidence of cache (again using ‘iftop -i
eth1’, which showed a steady 75+% of line rate on a 1GbE link.
Does that make sense at all? We had theorized that we wouldn’t get to
use VFS/kernel page cache on any node except maybe the one which held
the data in the local brick. That’s what drove us to setting the
gluster performance cache. But it doesn’t seem to come into play with
http access.
Volume info:
Volume Name: DOCROOT
Type: Distribute
Volume ID: 3aecd277-4d26-44cd-879d-cffbb1fec6ba
Status: Started
Number of Bricks: 16
Transport-type: tcp
Bricks:
<snipped list of bricks>
Options Reconfigured:
performance.cache-refresh-timeout: 30
performance.cache-size: 16GB
The net result of being overwhelmed by a hot spot is all the gluster
client nodes lose access to the gluster volume—it becomes so busy it
hangs. When the traffic goes away (failing health checks by load
balancers causes requests to be redirected elsewhere), the volume
eventually unfreezes and life goes on.
I wish I could type ALL that into a google query and get a lucid answer :)
Regards,
Christian
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users