On 12/15/2011 04:32 PM, Changliang Chen wrote:
Hi pranithk,
Thanks for your replay.
Because to keep availability,we haven't strace the process.After
shudowning the damon,the cluster recover.
In our case,
10.1.1.64(dfs-client-6): online node,when the other node(65)
restart,cpu usr usage reach 100% (glusterfsd process)
10.1.1.65(dfs-client-7): offline node,when it restart,the client
nfs mount point unavailable.
The nfs.log show that the reason of issue will be cause by client-6
high cpu usage,there are lots of error like:
[2011-12-14 13:25:53.30308] E [rpc-clnt.c:197:call_bail]
0-19loudfs-client-6: bailing out frame type(GlusterFS 3.1)
op(XATTROP(33)) xid = 0x89279937x sent = 2011-12-14 13:25:20.
346007. timeout = 30
On Wed, Dec 14, 2011 at 6:49 PM, Pranith Kumar K <[email protected]
<mailto:[email protected]>> wrote:
On 12/14/2011 03:06 PM, Changliang Chen wrote:
Hi,we have use glusterfs for two years. After upgraded to
3.2.5,we discover that when one of replicate node reboot and
startup the glusterd daemon,the gluster will crash cause by the
other
replicate node cpu usage reach 100%.
Our gluster info:
Type: Distributed-Replicate
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Options Reconfigured:
performance.cache-size: 3GB
performance.cache-max-file-size: 512KB
network.frame-timeout: 30
network.ping-timeout: 25
cluster.min-free-disk: 10%
Our device:
Dell R710
600Gsas *6
3*8Gmem
The error info:
[2011-12-14 13:24:10.483812] E [rdma.c:4813:init]
0-rdma.management: Failed to initialize IB Device
[2011-12-14 13:24:10.483828] E
[rpc-transport.c:742:rpc_transport_load] 0-rpc-transport: 'rdma'
initialization failed
[2011-12-14 13:24:10.483841] W
[rpcsvc.c:1288:rpcsvc_transport_create] 0-rpc-service: cannot
create listener, initing the transport failed
[2011-12-14 13:24:11.967621] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-0
[2011-12-14 13:24:11.967665] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-1
[2011-12-14 13:24:11.967681] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-2
[2011-12-14 13:24:11.967695] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-3
[2011-12-14 13:24:11.967709] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-4
[2011-12-14 13:24:11.967723] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-5
[2011-12-14 13:24:11.967736] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-6
[2011-12-14 13:24:11.967750] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-7
[2011-12-14 13:24:11.967764] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-8
[2011-12-14 13:24:11.967777] E
[glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-9
[2011-12-14 13:24:12.465565] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.17:1013 <http://10.1.1.17:1013>)
[2011-12-14 13:24:12.465623] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.8:1013 <http://10.1.1.8:1013>)
[2011-12-14 13:24:12.465656] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.10:1013 <http://10.1.1.10:1013>)
[2011-12-14 13:24:12.465686] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.11:1013 <http://10.1.1.11:1013>)
[2011-12-14 13:24:12.465716] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.125:1013 <http://10.1.1.125:1013>)
[2011-12-14 13:24:12.633288] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.65:1006 <http://10.1.1.65:1006>)
[2011-12-14 13:24:13.138150] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.1:1013 <http://10.1.1.1:1013>)
[2011-12-14 13:24:13.284665] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.3:1013 <http://10.1.1.3:1013>)
[2011-12-14 13:24:15.790805] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.8:1013 <http://10.1.1.8:1013>)
[2011-12-14 13:24:16.113430] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.125:1013 <http://10.1.1.125:1013>)
[2011-12-14 13:24:16.259040] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.10:1013 <http://10.1.1.10:1013>)
[2011-12-14 13:24:16.392058] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.17:1013 <http://10.1.1.17:1013>)
[2011-12-14 13:24:16.429444] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not
connected), peer (10.1.1.11:1013 <http://10.1.1.11:1013>)
[2011-12-14 13:26:05.787680] W
[glusterfsd.c:727:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x37c8ed3c2d]
(-->/lib64/libpthread.so.0 [0x37c96064a7]
(-->/opt/glusterfs/3.2.5/sbin/glusterd(glusterfs_sigwaiter+0x17c)
[0x40477c]))) 0-: received signum (15), shutting down
--
Regards,
Cocl
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
hi Changliang,
Could you specify which process crashed. Is it glusterd or
glusterfs? Could you provide the stack trace that is present in
it's respective logfile. I dont see any stack trace in the logs
you have provided.
Pranith
--
Regards,
Cocl
OM manager
19lou Operation & Maintenance Dept
Could you send the logs of all the machines, we will check and getback
to you.
Pranith
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users