Hi Bob,

thanks for the links to your tools. I'm going to try them asap. Am I right that 
I need debugfs to be enabled for those tools to work?
Since you're so involved with this filesystem you can possibly answer me this 
question so that we don't need any further testing: right now we're thinking 
about growing the cluster in terms of diskspace (iscsi connection). Right now 
it's about 3TB and we want to grow it by another 3TB.
When there are many locks, we see that the dlm_controld uses up to 20% of cpu 
power and the file system access rate drops dramatically, causing the nodes to 
increase their load to 130 because of the io wait time.
Since we want to grow the disk space, we don't want to make the system unstable 
or unusable because of all those waiting times. Does it make a difference if we 
make the new 3TB partition a new iscsi target and therefore a new gfs2 
filesystem, or will higher iowaits / locktimes from the first iscsi target also 
have an impact on the new iscsi target? Another big question here is if the 
dlm_controld scales good enough to separate those two different targets?

Another weird behavior is the one with many files in a single directory. We 
have a directory with about 100.000 pictures in it (100gb of data), it takes 
nearly forever to do something like "ls" or even worse "ls -la" and the load 
explodes on all nodes. Is there some kind of known limitation with many files 
in a single directory?

Do you have any clue on when you're going to release the next kernel version? 
Since centos kinda sticks to rhel kernel cycles this would give us some hint on 
when to expect improvements. The last kernel 2.6.32-358.el6 is  from 2013-02-21 
and not useable due to severe bugs that cause node fencing and filesys revoking 
- so we're using 2.6.32-279.22.1.el6.x86_64 now, which seems quite old and 
lacking a lot of features

Thanks again, Jürgen

-----Ursprüngliche Nachricht-----
Von: linux-cluster-boun...@redhat.com [mailto:linux-cluster-boun...@redhat.com] 
Im Auftrag von Bob Peterson
Gesendet: Mittwoch, 06. November 2013 14:39
An: linux clustering
Betreff: Re: [Linux-cluster] gfs2 kernel versions

----- Original Message -----
| Hi Bob,
| 
| first of all, thank you for your amazing work.
| 
| Do you include any kind of versioning with your releases so that we 
| can check what gfs2 version is running on our Gentoo with 3.1 kernel, 
| and what version is running on 2.6.32 kernel on centos?
| The PHP processes hanging in D state are kinda annoying and it's not 
| possible to use the latest centos kernel due to severe crashes in certain 
conditions.
| 
| Since I'm very familiar with kernels (Gentoo requires that you make 
| your own), I'm pretty sure that we can build and use a regular 
| mainstream kernel provided by kernel.org - it looks like there is also 
| much development going on by you and Mr. Whitehouse.
| 
| You say that " The more recent the version, the better and faster GFS2 
| should be" - do you mean the kernel version or GFS2 version? If the 
| later, how can we find out what version we're running?
| 
| Thanks in advance,
| Juergen

Hi Jürgen,

There aren't really any version markers to identify which patches are in which 
kernel. With the RHEL, Centos and Fedora versions, you can get the kernel 
version and trace that back to the tags in the source git repository. In other 
words, if you have a kernel version 2.6.32-358.20.1.el6, you can go back to the 
source repository and look through the commit messages to figure out what's in 
there.
Aside from that, it's hard to tell what patches are where. It's not 
straightforward. 

Debugging performance problems are a challenge, and there are many many 
variables to look at. If it's a straight-up hang, we have tools like my 
"gfs2_hangalyzer"
tool on my people page.

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/gfs2_hangalyzer.c

If it's not a true hang, but just slowness, you can check for GFS2 lock 
contention between the nodes with another tool I wrote: glocktop.c (same 
directory).
The glocktop tool is like "top" in that it shows what glocks are being waited 
for, and their status. If it's a directory, it will even give you the directory 
name.
The tool does its job by taking glock dumps and extracting the ones on which 
there are processes waiting. Before you run it, you should make sure your 
version of
GFS2 has the patch for "faster glock dumps". With that patch, a glock dump 
should take less than a second. Without it, a glock dump can take a very long 
time.
(A glock dump being the same as catting /sys/kernel/debug/gfs2/<lock table 
name>/glocks)

If it's not glock contention, it could be many things. You just have to go 
through all the possibilities and see where the bottlenecks are.

Yes, lots of development going on, still. :)

When I spoke about using the most recent code, I meant the most recent GFS2 
code, which is usually coupled to a given kernel.

Regards,

Bob Peterson
Red Hat File Systems

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to