Re: [Gluster-users] uninterruptible processes writing to glusterfsshare

Markus Fröhlich Tue, 07 Jun 2011 06:53:35 -0700

hi!

there ist no relavant output from dmesg.

no entries in the server log - only the one line in the client-serverlog, I already posted.

the glusterfs version on the server had been updated to gfs 3.2.0 morethan a month ago.because of the troubles on the backup server, I deleted the whole backupshare and started from scratch.

I looked for a update of "fuse" and upgraded from 2.7.2-61.18.1 to2.8.5-41.1

maybe this helps.

here is the changelog info:

Authors:
--------
    Miklos Szeredi <[email protected]>
Distribution: systemsmanagement:baracus / SLE_11_SP1
* Tue Mar 29 2011 [email protected]
- remove the --no-canonicalize usage for suse_version <= 11.3

* Mon Mar 21 2011 [email protected]
- licenses package is about to die

* Thu Feb 17 2011 [email protected]
- In case of failure to add to /etc/mtab don't umount. [bnc#668820]
  [CVE-2011-0541]

* Tue Nov 16 2010 [email protected]
- Fix symlink attack for mount and umount [bnc#651598]

* Wed Oct 27 2010 [email protected]
- Remove /etc/init.d/boot.fuse [bnc#648843]

* Tue Sep 28 2010 [email protected]
- update to 2.8.5
  * fix option escaping for fusermount [bnc#641480]

* Wed Apr 28 2010 [email protected]
- keep examples and internal docs in devel package (from jnweiger)

* Mon Apr 26 2010 [email protected]
- update to 2.8.4
  * fix checking for symlinks in umount from /tmp
  * fix umounting if /tmp is a symlink


kind regards
markus froehlich

Am 06.06.2011 21:19, schrieb Anthony J. Biacco:

Could be fuse, check 'dmesg' for kernel module timeouts.

In a similar vein, has anyone seen signifigant performance/reliability with 
diff fuse versions? say, latest source vs. Rhel distro rpms vers.

-Tony



-----Original Message-----
From: Mohit Anchlia<[email protected]>
Sent: June 06, 2011 1:14 PM
To: Markus Fröhlich<[email protected]>
Cc: [email protected]<[email protected]>
Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare

Is there anything in the server logs? Does it follow any particular
pattern before going in this mode?

Did you upgrade Gluster or is this new install?

2011/6/6 Markus Fröhlich<[email protected]>:

hi!

sometimes we've on some client-servers hanging uninterruptible processes
("ps aux" stat is on "D" ) and on one the CPU wait I/O grows within some
minutes to 100%.
you are not able to kill such processes - also "kill -9" doesnt work - when
you connect via "strace" to such an process, you wont see anything and you
cannot detach it again.

there are only two possibilities:
killing the glusterfs process (umount GFS share) or rebooting the server.

the only log entry I found, was on one client - just a single line:
[2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr]
0-office-data-replicate-0: data self-heal is pending for
/pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML/bilder/Thumbs.db.

one of the client-servers is a samba-server, the other one a backup-server
based on rsync with millions of small files.

gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0

and here are the configs from server and client:
server config
"/etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol":
volume office-data-posix
    type storage/posix
    option directory /GFS/office-data02
end-volume

volume office-data-access-control
    type features/access-control
    subvolumes office-data-posix
end-volume

volume office-data-locks
    type features/locks
    subvolumes office-data-access-control
end-volume

volume office-data-io-threads
    type performance/io-threads
    subvolumes office-data-locks
end-volume

volume office-data-marker
    type features/marker
    option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659
    option timestamp-file /etc/glusterd/vols/office-data/marker.tstamp
    option xtime off
    option quota off
    subvolumes office-data-io-threads
end-volume

volume /GFS/office-data02
    type debug/io-stats
    option latency-measurement off
    option count-fop-hits off
    subvolumes office-data-marker
end-volume

volume office-data-server
    type protocol/server
    option transport-type tcp
    option auth.addr./GFS/office-data02.allow *
    subvolumes /GFS/office-data02
end-volume


--------------
client config "/etc/glusterd/vols/office-data/office-data-fuse.vol":
volume office-data-client-0
    type protocol/client
    option remote-host gfs-01-01
    option remote-subvolume /GFS/office-data02
    option transport-type tcp
end-volume

volume office-data-replicate-0
    type cluster/replicate
    subvolumes office-data-client-0
end-volume

volume office-data-write-behind
    type performance/write-behind
    subvolumes office-data-replicate-0
end-volume

volume office-data-read-ahead
    type performance/read-ahead
    subvolumes office-data-write-behind
end-volume

volume office-data-io-cache
    type performance/io-cache
    subvolumes office-data-read-ahead
end-volume

volume office-data-quick-read
    type performance/quick-read
    subvolumes office-data-io-cache
end-volume

volume office-data-stat-prefetch
    type performance/stat-prefetch
    subvolumes office-data-quick-read
end-volume

volume office-data
    type debug/io-stats
    option latency-measurement off
    option count-fop-hits off
    subvolumes office-data-stat-prefetch
end-volume


  -- Mit freundlichen Grüssen

Markus Fröhlich
Techniker

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] uninterruptible processes writing to glusterfsshare

Reply via email to