hi.
thanks for your feedback!
please also update the related bug with your information and your logs:
http://bugs.gluster.com/show_bug.cgi?id=3011
cheers
Am 14.06.2011 15:44, schrieb Jiri Lunacek:
Another update...
Setting io-cache off, stat-prefetch off and io-thread-count 64 DOES NOT SOLVE
the problem. It only
narrows down the conditions in which the problem occurs.
I have run the same benchmark with these settings and ended up with the same
result. 55 hung
processes in sync_page ioctl.
I am attaching a dump from this test.
Jiri
Begin forwarded message:
*From: *Jiri Lunacek <[email protected]
<mailto:[email protected]>>
*Date: *14. června 2011 15:32:22 GMT+02:00
*To: *[email protected] <mailto:[email protected]>
*Subject: **[Gluster-users] Fwd: Apache hung tasks still occur with glusterfs
3.2.1*
I am back with the results.
Glusterdump is attached to this e-mail.
I found a way of reproducing this issue, I just ran:
for i in {1..1000}; do echo
"http://server.address.com/file/hosted/from/gluster.jpg"; done | xargs
-L1 -P20 wget -O/dev/null
i.e. multiple synchronous requests on an individual file through apache.
With io-cache enabled this caused over 70 apache processes to hang in sync_page
ioctl.
Glusterdump is attached to this e-mail.
I have the server back with the work-around parameters. I am hoping that the
dump will help you
guys isolate the bug.
Jiri
Begin forwarded message:
*From: *Jiri Lunacek <[email protected]
<mailto:[email protected]>>
*Date: *14. června 2011 14:59:56 GMT+02:00
*To: *[email protected] <mailto:[email protected]>
*Subject: **Re: [Gluster-users] Apache hung tasks still occur with glusterfs
3.2.1*
Hi.
hello.
do you maybe have already feedback?
was it successfull? (disabled io-cache, disabled stat-prefetch, inreades
io-thread count to 64)
For now it seems that the work-arround has worked. We have not encountered any
hung processes on
the server since the change (io-cache disable, stat-prefetch disable
io-thread-count=64).
The only "bad" influence is expectable, the pages (mainly list of several
hundred images per
page) take a little while longer. Of course this is caused by the files not
being cached.
is/was your problem similar to this one?
http://bugs.gluster.com/show_bug.cgi?id=3011
The symptoms were the same. The processes were hung on ioctl. /proc//wchan for
the PIDs showed
"sync_page".
I'll experiment a bit once again today and set the volume back to original
parameters and wait
for a hung process to get you the information (/tmp/glusterdump.pid).
I'll report back later.
Jiri
Am 13.06.2011 19:14, schrieb Jiri Lunacek:
Thanks for the tip. I disabled io-cache and stat-prefetch, increased
io-thread-count to 64 and
rebooted the server to clean off the hung apache processes. We'll see tomorrow.
On 13.6.2011, at 15:58, Justice London wrote:
Disable io-cache and up the threads to 64 and your problems should disappear.
They did for me when
I made both of these changes.
Justice London
*From:*[email protected]
<mailto:*[email protected]>
<mailto:[email protected]>[mailto:[email protected]]*On
Behalf
Of*Jiri Lunacek
*Sent:*Monday, June 13, 2011 1:49 AM
*To:*[email protected] <mailto:*[email protected]>
<mailto:[email protected]>
*Subject:*[Gluster-users] Apache hung tasks still occur with glusterfs 3.2.1
Hi all.
We have been having problems with hung tasks of apache reading from glusterfs
2-replica volume
ever since upgrading to 3.2.0. The problems were identical to those described
here:
http://gluster.org/pipermail/gluster-users/2011-May/007697.html
Yesterday we updated to 3.2.1.
A good thing is that the hung tasks stopped appearing when gluster is in
"intact" operation, i.e.
when there are no modifications to the gluster configs at all.
Today we modified some other volume exported by the same cluster (but not
sharing anything with
the volume used by the apache process). And, once again, two requests of apache
reading from
glusterfs volume are stuck.
Any help with this issue would be very appreciated as right now we have to
nightly-reboot the
machine as the processes re stuck in iowait -> unkillable.
I really do not want to go through the downgrade to 3.1.4 since it seems from
the mailing list
that it may not go exactly smooth. We are exporting millions of files and any
large operation on
the exported filesystem takes days.
I am attaching tech info on the problem.
client:
Centos 5.6
2.6.18-238.9.1.el5
fuse-2.7.4-8.el5
glusterfs-fuse-3.2.1-1
glusterfs-core-3.2.1-1
servers:
Centos 5.6
2.6.18-194.32.1.el5
fuse-2.7.4-8.el5
glusterfs-fuse-3.2.1-1
glusterfs-core-3.2.1-1
dmesg:
INFO: task httpd:1246 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
httpd D ffff81000101d7a0 0 1246 2394 1247 1191 (NOTLB)
ffff81013ee7dc38 0000000000000082 0000000000000092 ffff81013ee7dcd8
ffff81013ee7dd04 000000000000000a ffff810144d0f7e0 ffff81019fc28100
0000308f8b444727 00000000000014ee ffff810144d0f9c8 000000038006e608
Call Trace:
[<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
[<ffffffff80028c5a>] sync_page+0x0/0x43
[<ffffffff800637ca>] io_schedule+0x3f/0x67
[<ffffffff80028c98>] sync_page+0x3e/0x43
[<ffffffff8006390e>] __wait_on_bit_lock+0x36/0x66
[<ffffffff8003ff27>] __lock_page+0x5e/0x64
[<ffffffff800a2921>] wake_bit_function+0x0/0x23
[<ffffffff8003fd85>] pagevec_lookup+0x17/0x1e
[<ffffffff800cc666>] invalidate_inode_pages2_range+0x73/0x1bd
[<ffffffff8004fc94>] finish_wait+0x32/0x5d
[<ffffffff884b9798>] :fuse:wait_answer_interruptible+0xb6/0xbd
[<ffffffff800a28f3>] autoremove_wake_function+0x0/0x2e
[<ffffffff8009a485>] recalc_sigpending+0xe/0x25
[<ffffffff8001decc>] sigprocmask+0xb7/0xdb
[<ffffffff884bd456>] :fuse:fuse_finish_open+0x36/0x62
[<ffffffff884bda11>] :fuse:fuse_open_common+0x147/0x158
[<ffffffff884bda22>] :fuse:fuse_open+0x0/0x7
[<ffffffff8001eb99>] __dentry_open+0xd9/0x1dc
[<ffffffff8002766e>] do_filp_open+0x2a/0x38
[<ffffffff8001a061>] do_sys_open+0x44/0xbe
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
INFO: task httpd:1837 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
httpd D ffff810001004420 0 1837 2394 1856 1289 (NOTLB)
ffff81013c6f9c38 0000000000000086 ffff81013c6f9bf8 00000000fffffffe
ffff810170ce7000 000000000000000a ffff81019c0ae7a0 ffffffff80311b60
0000308c0f83d792 0000000000000ec4 ffff81019c0ae988 000000008006e608
Call Trace:
[<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
[<ffffffff80028c5a>] sync_page+0x0/0x43
[<ffffffff800637ca>] io_schedule+0x3f/0x67
[<ffffffff80028c98>] sync_page+0x3e/0x43
[<ffffffff8006390e>] __wait_on_bit_lock+0x36/0x66
[<ffffffff8003ff27>] __lock_page+0x5e/0x64
[<ffffffff800a2921>] wake_bit_function+0x0/0x23
[<ffffffff8003fd85>] pagevec_lookup+0x17/0x1e
[<ffffffff800cc666>] invalidate_inode_pages2_range+0x73/0x1bd
[<ffffffff8004fc94>] finish_wait+0x32/0x5d
[<ffffffff884b9798>] :fuse:wait_answer_interruptible+0xb6/0xbd
[<ffffffff800a28f3>] autoremove_wake_function+0x0/0x2e
[<ffffffff8009a485>] recalc_sigpending+0xe/0x25
[<ffffffff8001decc>] sigprocmask+0xb7/0xdb
[<ffffffff884bd456>] :fuse:fuse_finish_open+0x36/0x62
[<ffffffff884bda11>] :fuse:fuse_open_common+0x147/0x158
[<ffffffff884bda22>] :fuse:fuse_open+0x0/0x7
[<ffffffff8001eb99>] __dentry_open+0xd9/0x1dc
[<ffffffff8002766e>] do_filp_open+0x2a/0x38
[<ffffffff8001a061>] do_sys_open+0x44/0xbe
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
INFO: task httpd:383 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
httpd D ffff81019fa21100 0 383 2394 534 (NOTLB)
ffff81013e497c08 0000000000000082 ffff810183eb8910 ffffffff884b9219
ffff81019e41c600 0000000000000009 ffff81019b1e2100 ffff81019fa21100
0000308c0e2c2bfb 0000000000016477 ffff81019b1e22e8 000000038006e608
Call Trace:
[<ffffffff884b9219>] :fuse:flush_bg_queue+0x2b/0x48
[<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
[<ffffffff8005a9f9>] getnstimeofday+0x10/0x28
[<ffffffff80028c5a>] sync_page+0x0/0x43
[<ffffffff800637ca>] io_schedule+0x3f/0x67
[<ffffffff80028c98>] sync_page+0x3e/0x43
[<ffffffff8006390e>] __wait_on_bit_lock+0x36/0x66
[<ffffffff8003ff27>] __lock_page+0x5e/0x64
[<ffffffff800a2921>] wake_bit_function+0x0/0x23
[<ffffffff8000c48d>] do_generic_mapping_read+0x1df/0x359
[<ffffffff8000d279>] file_read_actor+0x0/0x159
[<ffffffff8000c753>] __generic_file_aio_read+0x14c/0x198
[<ffffffff800c8c45>] generic_file_read+0xac/0xc5
[<ffffffff800a28f3>] autoremove_wake_function+0x0/0x2e
[<ffffffff80130778>] selinux_file_permission+0x9f/0xb4
[<ffffffff8000b78d>] vfs_read+0xcb/0x171
[<ffffffff80011d34>] sys_read+0x45/0x6e
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
possibly relevant log messages:
[2011-06-13 09:09:14.985576] W [socket.c:1494:__socket_proto_state_machine]
0-glusterfs: reading
from socket failed. Error (Transport endpoint is not connected), peer
(81.0.225.122:24007)
[2011-06-13 09:09:25.741055] I [glusterfsd-mgmt.c:637:mgmt_getspec_cbk] 0-: No
change in volfile,
continuing
[2011-06-13 09:59:00.644130] I [afr-common.c:639:afr_lookup_self_heal_check]
0-isifa1-replicate-0:
size differs for /data/foto/thumbs/3140/31409780.jpg
[2011-06-13 09:59:00.644269] I [afr-common.c:801:afr_lookup_done]
0-isifa1-replicate-0: background
meta-data data self-heal triggered. path: /data/foto/thumbs/3140/31409780.jpg
[2011-06-13 09:59:00.822524] W [dict.c:437:dict_ref]
(-->/opt/glusterfs/3.2.1/lib64/glusterfs/3.2.1/xlator/protocol/client.so(client3_1_fstat_cbk+0x2cb)
[0x2aaaab1afa0b]
(-->/opt/glusterfs/3.2.1/lib64/glusterfs/3.2.1/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d)
[0x2aaaab3e4c9d]
(-->/opt/glusterfs/3.2.1/lib64/glusterfs/3.2.1/xlator/cluster/replicate.so(afr_sh_data_fix+0x1fc)
[0x2aaaab3e493c]))) 0-dict: dict is NULL
[2011-06-13 09:59:00.824323] I [afr-common.c:639:afr_lookup_self_heal_check]
0-isifa1-replicate-0:
size differs for /data/foto/thumbs/3140/31409781.jpg
[2011-06-13 09:59:00.824356] I [afr-common.c:801:afr_lookup_done]
0-isifa1-replicate-0: background
meta-data data self-heal triggered. path: /data/foto/thumbs/3140/31409781.jpg
[2011-06-13 09:59:00.826494] I
[afr-self-heal-common.c:1557:afr_self_heal_completion_cbk]
0-isifa1-replicate-0: background meta-data data self-heal completed on
/data/foto/thumbs/3140/31409780.jpg
[2011-06-13 09:59:00.830902] W [dict.c:437:dict_ref]
(-->/opt/glusterfs/3.2.1/lib64/glusterfs/3.2.1/xlator/protocol/client.so(client3_1_fstat_cbk+0x2cb)
[0x2aaaab1afa0b]
(-->/opt/glusterfs/3.2.1/lib64/glusterfs/3.2.1/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x17d)
[0x2aaaab3e4c9d]
(-->/opt/glusterfs/3.2.1/lib64/glusterfs/3.2.1/xlator/cluster/replicate.so(afr_sh_data_fix+0x1fc)
[0x2aaaab3e493c]))) 0-dict: dict is NULL
gluster-server volume config:
volume isifa1-posix
type storage/posix
option directory /mnt/data/isifa1
end-volume
volume isifa1-access-control
type features/access-control
subvolumes isifa1-posix
end-volume
volume isifa1-locks
type features/locks
subvolumes isifa1-access-control
end-volume
volume isifa1-io-threads
type performance/io-threads
subvolumes isifa1-locks
end-volume
volume isifa1-marker
type features/marker
option volume-uuid 39d5c509-ad39-4b24-a272-c33e212cf912
option timestamp-file /etc/glusterd/vols/isifa1/marker.tstamp
option xtime off
option quota off
subvolumes isifa1-io-threads
end-volume
volume /mnt/data/isifa1
type debug/io-stats
option latency-measurement off
option count-fop-hits off
subvolumes isifa1-marker
end-volume
volume isifa1-server
type protocol/server
option transport-type tcp
option auth.addr./mnt/data/isifa1.allow
81.0.225.120,81.0.225.121,81.0.225.90,81.0.225.117,81.0.225.118,82.208.17.113
subvolumes /mnt/data/isifa1
end-volume
gluster-fuse config:
volume isifa1-client-0
type protocol/client
option remote-host isifa-data1
option remote-subvolume /mnt/data/isifa1
option transport-type tcp
end-volume
volume isifa1-client-1
type protocol/client
option remote-host isifa-data2
option remote-subvolume /mnt/data/isifa1
option transport-type tcp
end-volume
volume isifa1-replicate-0
type cluster/replicate
subvolumes isifa1-client-0 isifa1-client-1
end-volume
volume isifa1-write-behind
type performance/write-behind
subvolumes isifa1-replicate-0
end-volume
volume isifa1-read-ahead
type performance/read-ahead
subvolumes isifa1-write-behind
end-volume
volume isifa1-io-cache
type performance/io-cache
subvolumes isifa1-read-ahead
end-volume
volume isifa1-quick-read
type performance/quick-read
subvolumes isifa1-io-cache
end-volume
volume isifa1-stat-prefetch
type performance/stat-prefetch
subvolumes isifa1-quick-read
end-volume
volume isifa1
type debug/io-stats
option latency-measurement off
option count-fop-hits off
subvolumes isifa1-stat-prefetch
end-volume
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
--
Mag. Christopher Anderlik
Leiter Technik
________________________________________________________________________________
Xidras GmbH
Stockern 47
3744 Stockern
Austria
Tel: 0043 2983 201 30 5 01
Fax: 0043 2983 201 30 5 01 9
Email: [email protected] <mailto:[email protected]>
Web: http://www.xidras.com <http://www.xidras.com/>
FN 317036 f | Landesgericht Krems | ATU64485024
________________________________________________________________________________
VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten wir Sie,
diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen,
noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!
CONFIDENTIAL!
This email contains confidential information and is intended for the authorised
recipient only. If you are not an authorised recipient, please return the email
to us and then delete it from your computer and mail-server. You may neither
use nor edit any such emails including attachments, nor make them accessible
to third parties in any manner whatsoever.
Thank you for your cooperation
________________________________________________________________________________
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
--
Mag. Christopher Anderlik
Leiter Technik
________________________________________________________________________________
Xidras GmbH
Stockern 47
3744 Stockern
Austria
Tel: 0043 2983 201 30 5 01
Fax: 0043 2983 201 30 5 01 9
Email: [email protected]
Web: http://www.xidras.com
FN 317036 f | Landesgericht Krems | ATU64485024
________________________________________________________________________________
VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten wir Sie,
diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen,
noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!
CONFIDENTIAL!
This email contains confidential information and is intended for the authorised
recipient only. If you are not an authorised recipient, please return the email
to us and then delete it from your computer and mail-server. You may neither
use nor edit any such emails including attachments, nor make them accessible
to third parties in any manner whatsoever.
Thank you for your cooperation
________________________________________________________________________________
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users