HI Christopher.

What version of Linux are you running? If it's CentOS/RedHat 5.5, you may be 
seeing the bug that I referred to in the attached post to the list, subject:

[Gluster-users] CentOS 5.5 kernel bugs can cause temporary hangs upon client 
access to GlusterFS

James Burnash
Unix Engineer
Knight Capital Group


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Christopher Anderlik
Sent: Wednesday, July 13, 2011 3:29 AM
To: [email protected]
Subject: [Gluster-users] 3.2.1 - sometimes network outages

hi list,

we use glusterfs 3.2.1 with this configuration (one server - one client):

Volume Name: office-data
Type: Replicate
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gfs-01-01:/GFS/office-data02
Options Reconfigured:
performance.cache-size: 512MB
performance.quick-read: off


we "sometimes" noticed network outages from glusterfs - so for some 
seconds/minutes the client is not able to access the mounted glusterfs-share.

please see the picture enclosed ore here:
http://www.xidrasservice.com/gfs-network.JPG

there are no entries in client- or server-log.
everyhting else works fine - so there is no general network problem.

do you have any idea, what this might be?
what can we check else?

every help would be appreciated.

thx!!
christopher



DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged and/or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
e-mail, and any attachments thereto, is strictly prohibited. If you have 
received this in error, please immediately notify me and permanently delete the 
original and any copy of any e-mail and any printout thereof. E-mail 
transmission cannot be guaranteed to be secure or error-free. The sender 
therefore does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its 
discretion, monitor and review the content of all e-mail communications. 
http://www.knight.com
--- Begin Message ---
Got a complaint from a user - the native GlusterFS mountpoint was completely 
inaccessible from many (if not all) clients attempting to read or write from it.

Apparently not the fault of GlusterFS - here's the entry from the messages file:

Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.692284] INFO: task 
glusterfsd:12902 blocked for more than 120 seconds.
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.692544] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.693037] glusterfsd    D 
ffffffff80151248     0 12902      1         12904 12903 (NOTLB)
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.693553]  ffff81061190bbf8 
0000000000000086 ffff81061190bea8 0000000000000000
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.694099]  000000000000000c 
000000000000000a ffff810627eec0c0 ffff810c27f32100
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.694660]  000abc5dc58f770c 
0000000000005135 ffff810627eec2a8 000000038000b3fd

... and here's one for a non-Gluster process:

Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.761299] INFO: task 
jbd2/cciss!c2d0:4090 blocked for more than 120 seconds.
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.761908] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.762505] jbd2/cciss!c2 D 
ffffffff80151248     0  4090    456          4091  4085 (L-TLB)
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.763129]  ffff810617e45d60 
0000000000000046 ffff810617e45da0 ffffffff8008ccb0
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.763753]  ffff810617e45cf0 
000000000000000a ffff81063d22e820 ffff810c20b3c100
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.764370]  000abbf070cd535b 
0000000000003c6a ffff81063d22ea08 0000000300000000
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.764693] Call Trace:
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.765247]  [<ffffffff8008ccb0>] 
find_busiest_group+0x20d/0x621
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.765543]  [<ffffffff88342fad>] 
:jbd2:jbd2_journal_commit_transaction+0x191/0x1080
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766064]  [<ffffffff800a1ba4>] 
autoremove_wake_function+0x0/0x2e
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766327]  [<ffffffff8003ddd5>] 
lock_timer_base+0x1b/0x3c
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766588]  [<ffffffff8004b6b6>] 
try_to_del_timer_sync+0x7f/0x88
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766853]  [<ffffffff88346d72>] 
:jbd2:kjournald2+0x9a/0x1ec
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767109]  [<ffffffff800a1ba4>] 
autoremove_wake_function+0x0/0x2e
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767374]  [<ffffffff88346cd8>] 
:jbd2:kjournald2+0x0/0x1ec
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767627]  [<ffffffff800a198c>] 
keventd_create_kthread+0x0/0xc4
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767880]  [<ffffffff80032bdc>] 
kthread+0xfe/0x132
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768138]  [<ffffffff8005efb1>] 
child_rip+0xa/0x11
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768399]  [<ffffffff800a198c>] 
keventd_create_kthread+0x0/0xc4
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768656]  [<ffffffff80032ade>] 
kthread+0x0/0x132
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768922]  [<ffffffff8005efa7>] 
child_rip+0x0/0x11

Haven't found the specific bug number for this (CentOS 5.5) yet.

Running GlusterFS 3.1.3 on clients and 2 servers setup up as 
Replicated-Distribute.

Hopefully this will help others. I will be upgrading to CentOS 5.6 as soon as 
possible on these servers.

Kudos to my coworker Joe Collette for running this issue to ground.

James Burnash
Unix Engineer
Knight Capital Group



DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged and/or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
e-mail, and any attachments thereto, is strictly prohibited. If you have 
received this in error, please immediately notify me and permanently delete the 
original and any copy of any e-mail and any printout thereof. E-mail 
transmission cannot be guaranteed to be secure or error-free. The sender 
therefore does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its 
discretion, monitor and review the content of all e-mail communications. 
http://www.knight.com
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

--- End Message ---
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to