-Atin Sent from one plus one On Aug 31, 2015 10:34 PM, "Merlin Morgenstern" <merlin.morgenst...@gmail.com> wrote: > > Thank you all for your help. > > To explain the setup better, here is the goal I am trying to achieve: > > - 3 servers running in a cluster, each with a webserver uploading and serving files to visitors from a common glusterfs share. > - Server1 and Server2 have gluster-server installed > - One brick replicated between Server1 and Server2 with the goal of achieving High Availability > - Server1, Server2 and Server3 mount the brick through fuse. > - Server1 mounts Gluster-Server1 with Backup of Server 2. Same via versa for Server2 > > Now following scenario: > > 1. Server2 dies > > In this case Server1 serves as a failover and serves the files for Server1,2,3 until Server1 comes back up again. This works. > > 2. Server2 dies. Server1 has to reboot. > > In this case the service stays down. It is inpossible to remount the share without Server1. This is not acceptable for a High Availability System and I believe also not intended, but a misconfiguration or bug. This is exactly what I gave as an example in the thread (please read again). GlusterD is not supposed to start brick process if its other counter part hasn't come up yet in a 2 node setup. The reason it has been designed in this way is to block GlusterD on operating on a volume which could be stale as the node was down and cluster was operational earlier. > > Thank you again for looking into this. > > > 2015-08-31 14:10 GMT+02:00 Yiping Peng <barius...@gmail.com>: >>> >>> One more thing, when I do this on server1, which has been in the pool for a long time: >>> server1:~$ mount server1:/vol1 mountpoint >>> It also fails. >>> The log gave me: >> >> >> My fault, I used localhost as endpoint. >> >> I re-issued "mount -t glusterfs server01:/speech0 qqq" >> and the log shows a lot of things like: >> >> [2015-08-31 12:08:44.801169] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available >> [2015-08-31 12:08:44.801187] E [socket.c:3019:socket_connect] 0-speech0-client-43: Failed to set keep-alive: Protocol not available >> [2015-08-31 12:08:44.801305] W [socket.c:642:__socket_rwv] 0-speech0-client-43: readv on 10.88.153.25:24007 failed (Connection reset by peer) >> [2015-08-31 12:08:44.801404] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) 0-speech0-client-43: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-08-31 12:08:44.801294 (xid=0x17) >> [2015-08-31 12:08:44.801423] W [MSGID: 114032] [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-43: received RPC status error [Transport endpoint is not connected] >> [2015-08-31 12:08:44.801440] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-speech0-client-43: disconnected from speech0-client-43. Client process will keep trying to connect to glusterd until brick's port is available >> [2015-08-31 12:08:44.804488] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available >> [2015-08-31 12:08:44.804505] E [socket.c:3019:socket_connect] 0-speech0-client-51: Failed to set keep-alive: Protocol not available >> [2015-08-31 12:08:44.804775] W [socket.c:642:__socket_rwv] 0-speech0-client-51: readv on 10.88.146.19:24007 failed (Connection reset by peer) >> [2015-08-31 12:08:44.804878] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] ))))) 0-speech0-client-51: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-08-31 12:08:44.804693 (xid=0x18) >> [2015-08-31 12:08:44.804898] W [MSGID: 114032] [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-51: received RPC status error [Transport endpoint is not connected] >> [2015-08-31 12:08:44.804917] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-speech0-client-51: disconnected from speech0-client-51. Client process will keep trying to connect to glusterd until brick's port is available >> >> >> 2015-08-31 20:06 GMT+08:00 Yiping Peng <barius...@gmail.com>: >>> >>> >>>> I believe the following events have happened in the cluster resulting >>>> into this situation: >>>> 1. GlusterD & brick process on node 2 was brought down >>>> 2. Node 1 was rebooted. >>> >>> Strangely enough, glusterfs, glusterd and glusterfsd are running on my server. Is glusterfsd the brick process? Also server01 has not been rebooted during the whole process. >>> >>> glusterfsd has the following arguments: >>> /usr/sbin/glusterfsd -s server01.local.net --volfile-id speech0.server01.local.net.home-glusterfs-speech0-brick0 -p /var/lib/glusterd/vols/speech0/run/server01.local.net-home-glusterfs-speech0-brick0.pid -S /var/run/gluster/6bf40a98deade9dde8b615226bc57567.socket --brick-name /home/glusterfs/speech0/brick0 -l /var/log/glusterfs/bricks/home-glusterfs-speech0-brick0.log --xlator-option *-posix.glusterd-uuid=1c33ff18-2a6a-44cf-9a04-727fc96e92be --brick-port 49159 --xlator-option speech0-server.listen-port=49159 >>> >>> One more thing, when I do this on server1, which has been in the pool for a long time: >>> server1:~$ mount server1:/vol1 mountpoint >>> It also fails. >>> The log gave me: >>> >>> [2015-08-31 11:56:57.123307] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3 (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/speech0 qqq) >>> [2015-08-31 11:56:57.134642] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 9, Protocol not available >>> [2015-08-31 11:56:57.134688] E [socket.c:3019:socket_connect] 0-glusterfs: Failed to set keep-alive: Protocol not available >>> [2015-08-31 11:56:57.135063] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 >>> [2015-08-31 11:56:57.135113] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection reset by peer) >>> [2015-08-31 11:56:57.135149] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport endpoint is not connected) >>> [2015-08-31 11:56:57.135158] I [glusterfsd-mgmt.c:1825:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers >>> [2015-08-31 11:56:57.135333] W [glusterfsd.c:1219:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3) [0x7fb5e1be39a3] -->/usr/sbin/glusterfs() [0x4099c8] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received signum (1), shutting down >>> [2015-08-31 11:56:57.135371] I [fuse-bridge.c:5595:fini] 0-fuse: Unmounting '/home/speech/pengyiping/qqq'. >>> [2015-08-31 11:56:57.140640] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0() [0x318b207851] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received signum (15), shutting down >>> >>> >>> Any help is much appreciated. >>> >>> >>> 2015-08-31 19:15 GMT+08:00 Atin Mukherjee <amukh...@redhat.com>: >>>> >>>> I believe the following events have happened in the cluster resulting >>>> into this situation: >>>> 1. GlusterD & brick process on node 2 was brought down >>>> 2. Node 1 was rebooted. >>>> >>>> In the above case the mount will definitely fail since the brick process >>>> was not started as in a 2 node set up glusterd waits its peers to come >>>> up before it starts the bricks. Could you check whether the brick >>>> process is running or not? >>>> >>>> Thanks, >>>> Atin >>>> >>>> On 08/31/2015 04:17 PM, Yiping Peng wrote: >>>> > I've tried both: assuming server1 is already in pool, server2 is undergoing >>>> > peer-probing >>>> > >>>> > server2:~$ mount server1:/vol1 mountpoint, fail; >>>> > server2:~$ mount server2:/vol1 mountpoint, fail. >>>> > >>>> > Strange enough. I *should* be able to mount server1:/vol1 on server2. But >>>> > this is not the case :( >>>> > Maybe something is broken in the server pool, as I'm seeing disconnected >>>> > nodes? >>>> > >>>> > >>>> > 2015-08-31 18:02 GMT+08:00 Ravishankar N <ravishan...@redhat.com>: >>>> > >>>> >> >>>> >> >>>> >> On 08/31/2015 12:53 PM, Merlin Morgenstern wrote: >>>> >> >>>> >> Trying to mount the brick on the same physical server with deamon running >>>> >> on this server but not on the other server: >>>> >> >>>> >> @node2:~$ sudo mount -t glusterfs gs2:/volume1 /data/nfs >>>> >> Mount failed. Please check the log file for more details. >>>> >> >>>> >> For mount to succeed the glusterd must be up on the node that you specify >>>> >> as the volfile-server; gs2 in this case. You can use -o >>>> >> backupvolfile-server=gs1 as a fallback. >>>> >> -Ravi >>>> >> >>>> >> _______________________________________________ >>>> >> Gluster-users mailing list >>>> >> Gluster-users@gluster.org >>>> >> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >> >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > Gluster-users mailing list >>>> > Gluster-users@gluster.org >>>> > http://www.gluster.org/mailman/listinfo/gluster-users >>>> > >>> >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users