HI Christopher.
What version of Linux are you running? If it's CentOS/RedHat 5.5, you may be
seeing the bug that I referred to in the attached post to the list, subject:
[Gluster-users] CentOS 5.5 kernel bugs can cause temporary hangs upon client
access to GlusterFS
James Burnash
Unix Engineer
Knight Capital Group
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Christopher Anderlik
Sent: Wednesday, July 13, 2011 3:29 AM
To: [email protected]
Subject: [Gluster-users] 3.2.1 - sometimes network outages
hi list,
we use glusterfs 3.2.1 with this configuration (one server - one client):
Volume Name: office-data
Type: Replicate
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gfs-01-01:/GFS/office-data02
Options Reconfigured:
performance.cache-size: 512MB
performance.quick-read: off
we "sometimes" noticed network outages from glusterfs - so for some
seconds/minutes the client is not able to access the mounted glusterfs-share.
please see the picture enclosed ore here:
http://www.xidrasservice.com/gfs-network.JPG
there are no entries in client- or server-log.
everyhting else works fine - so there is no general network problem.
do you have any idea, what this might be?
what can we check else?
every help would be appreciated.
thx!!
christopher
DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the
addressee(s) named herein and may contain legally privileged and/or
confidential information. If you are not the intended recipient of this e-mail,
you are hereby notified that any dissemination, distribution or copying of this
e-mail, and any attachments thereto, is strictly prohibited. If you have
received this in error, please immediately notify me and permanently delete the
original and any copy of any e-mail and any printout thereof. E-mail
transmission cannot be guaranteed to be secure or error-free. The sender
therefore does not accept liability for any errors or omissions in the contents
of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its
discretion, monitor and review the content of all e-mail communications.
http://www.knight.com
--- Begin Message ---
Got a complaint from a user - the native GlusterFS mountpoint was completely
inaccessible from many (if not all) clients attempting to read or write from it.
Apparently not the fault of GlusterFS - here's the entry from the messages file:
Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.692284] INFO: task
glusterfsd:12902 blocked for more than 120 seconds.
Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.692544] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.693037] glusterfsd D
ffffffff80151248 0 12902 1 12904 12903 (NOTLB)
Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.693553] ffff81061190bbf8
0000000000000086 ffff81061190bea8 0000000000000000
Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.694099] 000000000000000c
000000000000000a ffff810627eec0c0 ffff810c27f32100
Jul 8 16:15:13 jc1letgfs13 kernel: [3022057.694660] 000abc5dc58f770c
0000000000005135 ffff810627eec2a8 000000038000b3fd
... and here's one for a non-Gluster process:
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.761299] INFO: task
jbd2/cciss!c2d0:4090 blocked for more than 120 seconds.
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.761908] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.762505] jbd2/cciss!c2 D
ffffffff80151248 0 4090 456 4091 4085 (L-TLB)
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.763129] ffff810617e45d60
0000000000000046 ffff810617e45da0 ffffffff8008ccb0
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.763753] ffff810617e45cf0
000000000000000a ffff81063d22e820 ffff810c20b3c100
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.764370] 000abbf070cd535b
0000000000003c6a ffff81063d22ea08 0000000300000000
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.764693] Call Trace:
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.765247] [<ffffffff8008ccb0>]
find_busiest_group+0x20d/0x621
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.765543] [<ffffffff88342fad>]
:jbd2:jbd2_journal_commit_transaction+0x191/0x1080
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766064] [<ffffffff800a1ba4>]
autoremove_wake_function+0x0/0x2e
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766327] [<ffffffff8003ddd5>]
lock_timer_base+0x1b/0x3c
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766588] [<ffffffff8004b6b6>]
try_to_del_timer_sync+0x7f/0x88
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.766853] [<ffffffff88346d72>]
:jbd2:kjournald2+0x9a/0x1ec
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767109] [<ffffffff800a1ba4>]
autoremove_wake_function+0x0/0x2e
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767374] [<ffffffff88346cd8>]
:jbd2:kjournald2+0x0/0x1ec
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767627] [<ffffffff800a198c>]
keventd_create_kthread+0x0/0xc4
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.767880] [<ffffffff80032bdc>]
kthread+0xfe/0x132
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768138] [<ffffffff8005efb1>]
child_rip+0xa/0x11
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768399] [<ffffffff800a198c>]
keventd_create_kthread+0x0/0xc4
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768656] [<ffffffff80032ade>]
kthread+0x0/0x132
Jul 8 16:07:13 jc1letgfs13 kernel: [3021577.768922] [<ffffffff8005efa7>]
child_rip+0x0/0x11
Haven't found the specific bug number for this (CentOS 5.5) yet.
Running GlusterFS 3.1.3 on clients and 2 servers setup up as
Replicated-Distribute.
Hopefully this will help others. I will be upgrading to CentOS 5.6 as soon as
possible on these servers.
Kudos to my coworker Joe Collette for running this issue to ground.
James Burnash
Unix Engineer
Knight Capital Group
DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the
addressee(s) named herein and may contain legally privileged and/or
confidential information. If you are not the intended recipient of this e-mail,
you are hereby notified that any dissemination, distribution or copying of this
e-mail, and any attachments thereto, is strictly prohibited. If you have
received this in error, please immediately notify me and permanently delete the
original and any copy of any e-mail and any printout thereof. E-mail
transmission cannot be guaranteed to be secure or error-free. The sender
therefore does not accept liability for any errors or omissions in the contents
of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its
discretion, monitor and review the content of all e-mail communications.
http://www.knight.com
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
--- End Message ---
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users