[Gluster-users] another look at high concurrency and cpu usage

John Madden Mon, 15 Feb 2010 11:13:16 -0800

I've made a few swings at using glusterfs for the php session store fora heavily-used web app (~6 million pages daily) and I've found time andagain that cpu usage and odd load characteristics cause glusterfs to beentirely unsuitable for this use case at least given my configuration.I posted on this earlier, but I'm hoping I can get some input on this asthings are way better than they were but still not good enough. I'm onv2.0.9 as the 3.0.x series doesn't seem to be fully settled yet, thoughfeel free to correct me on that.

I have a two-nodes replicate setup and four clients. Configs are below.What I see is that one brick gets pegged (load avg of 8) and the othersites much more idle (load avg of 1). The pegged node ends up with highrun queues and i/o blocked processes. CPU usage on the clients for theglusterfs processes gets pretty high, consuming at least an entire cpuwhen not spiking to consume both. I have very high thread counts on theclients to hopefully avoid thread waits on i/o requests. All sixmachines are identical xen instances.

When one of the bricks is down, cpu usage across the board goes waydown, interactivity goes way up, and things seem overall to be a wholelot better. Why is that? I would think that having two nodes would atleast result in better read rates.

I've gone through various caching schemes and tried readahead,writebehind, quick-read, and stat-prefetch. I found quick-read caused aton of memory consumption and didn't help on performance. I didn't seemuch of a change at all with stat-prefetch.


...Any thoughts?

### fsd.vol:

volume sessions
  type storage/posix
  option directory /var/glusterfs/sessions
  option o-direct off
end-volume
volume data
  type storage/posix
  option directory /var/glusterfs/data
  option o-direct off
end-volume
volume locks0
  type features/locks
  option mandatory-locks on
  subvolumes data
end-volume
volume locks1
  type features/locks
  option mandatory-locks on
  subvolumes sessions
end-volume
volume brick0
  type performance/io-threads
  option thread-count 32 # default is 16
  subvolumes locks0
end-volume
volume brick1
  type performance/io-threads
  option thread-count 32 # default is 16
  subvolumes locks1
end-volume
volume server
  type protocol/server
  option transport-type tcp
  option transport.socket.nodelay on
  subvolumes brick0 brick1
  option auth.addr.brick0.allow ip's...
  option auth.addr.brick1.allow ip's...
end-volume


### client.vol (just one connection shown here)

volume glusterfs0-hs
  type protocol/client
  option transport-type tcp
  option remote-host "ip"
  option ping-timeout 10
  option transport.socket.nodelay on
  option remote-subvolume brick1
end-volume
volume glusterfs1-hs
  type protocol/client
  option transport-type tcp
  option remote-host "ip"
  option ping-timeout 10server for each request
  option transport.socket.nodelay onspeed
  option remote-subvolume brick1
end-volume
volume replicated
  type cluster/replicate
  subvolumes glusterfs0-hs glusterfs1-hs
end-volume
volume iocache
  type performance/io-cache
  option cache-size 512MB
  option cache-timeout 10
  subvolumes replicated
end-volume
volume writeback
  type performance/write-behind
  option cache-size 128MB
  option flush-behind off
  subvolumes iocache
end-volume
volume iothreads
  type performance/io-threads
  option thread-count 100
  subvolumes writeback
end-volume





--
John Madden
Sr UNIX Systems Engineer
Ivy Tech Community College of Indiana
[email protected]
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

[Gluster-users] another look at high concurrency and cpu usage

Reply via email to