Hi

Yes, I have 2 servers and 2 clients.

I have 1 client connecting to both servers and another client connecting to both servers.
Both clients configs are exactly the same

If server1 goes down, client1 breaks (as per my mail below), client1 doesn't just continue working on server2.

Is that enough info?


Adrian Moisey
Systems Designer | CareerJunction | Better jobs. More often.
Web: www.careerjunction.co.za | Email: [email protected]
Phone: +27 21 818 8621 | Mobile: +27 82 858 7830 | Fax: +27 21 818 8855


Pavan Vilas Sondur wrote:
Hi Adrian,
Correct me if I've got you wrong - You have 2 servers and a client replicates 
to both the servers. If the first server is down, the client also does not 
respond. You mentioned about more than 1 client - can you clarify this so that 
we can try and understand the issue.

Pavan

On 01/10/09 08:41 +0200, Adrian Moisey wrote:
Hi

I am currently testing GlusterFS in with replication.
I am running Ubuntu hardy using packages from the PPA on launchpad.net. I am currently using glusterfs 2.0.6.

I have 2 machines, both exporting 1 brick each. This is the config I'm using:
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
volume posix
 type storage/posix
 option directory /home/export/
end-volume

volume locks
  type features/locks
  subvolumes posix
end-volume

volume cache
  type performance/io-cache
  subvolumes locks
end-volume

volume brick
  type performance/io-threads
  option thread-count 8
  subvolumes cache
end-volume

### Add network serving capability to above brick.
volume server
 type protocol/server
 option transport-type tcp
 subvolumes brick
 option auth.addr.brick.allow * # Allow access to "brick" volume
end-volume
----8<----8<----8<----8<----8<----8<----8<----8<----8<----

I then have 2 clients (which happen to be the same 2 machines) that connect to both bricks and replicate them using this config:

----8<----8<----8<----8<----8<----8<----8<----8<----8<----
### Add client feature and attach to remote subvolume of server1
volume brick1
 type protocol/client
 option transport-type tcp
 option remote-host 172.19.45.102      # IP address of the remote brick
 option remote-subvolume brick        # name of the remote volume
end-volume

### Add client feature and attach to remote subvolume of server2
volume brick2
 type protocol/client
 option transport-type tcp
 option remote-host 172.19.45.103      # IP address of the remote brick
 option remote-subvolume brick        # name of the remote volume
end-volume

volume replicate
 type cluster/replicate
 subvolumes brick1 brick2
end-volume
----8<----8<----8<----8<----8<----8<----8<----8<----8<----

If I start the 2 servers up, then mount both clients everything works file. I have shared storage which is replicated to each host.

If I shut the one brick down, the client on that machine also dies and I get strange errors:
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
# cd /mnt/gluster
bash: cd: /mnt/gluster: Transport endpoint is not connected
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.5G  1.1G  7.9G  13% /
varrun                125M   68K  125M   1% /var/run
varlock               125M     0  125M   0% /var/lock
udev                  125M   44K  125M   1% /dev
devshm                125M     0  125M   0% /dev/shm
df: `/mnt/gluster': Transport endpoint is not connected
# mount
/dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
/sys on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
udev on /dev type tmpfs (rw,mode=0755)
devshm on /dev/shm type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
securityfs on /sys/kernel/security type securityfs (rw)
/etc/glusterfs/glusterfs.vol on /mnt/gluster type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)
----8<----8<----8<----8<----8<----8<----8<----8<----8<----

Here is a copy of debug logs:
[2009-10-01 08:16:15] D [glusterfsd.c:354:_get_specfp] glusterfs: loading volume file /etc/glusterfs/glusterfs.vol
================================================================================
Version      : glusterfs 2.0.6 built on Aug 31 2009 20:14:31
TLA Revision : v2.0.6
Starting Time: 2009-10-01 08:16:15
Command line : glusterfs --log-level=DEBUG --volfile=/etc/glusterfs/glusterfs.vol /mnt/gluster/
PID          : 17884
System name  : Linux
Nodename     : cj-cpt-molb01
Kernel Release : 2.6.24-24-server
Hardware Identifier: i686

Given volfile:
+------------------------------------------------------------------------------+
  1: ### Add client feature and attach to remote subvolume of server1
  2: volume brick1
  3:  type protocol/client
  4:  option transport-type tcp
5: option remote-host 172.19.45.102 # IP address of the remote brick
  6:  option remote-subvolume brick        # name of the remote volume
  7: end-volume
  8:
  9: ### Add client feature and attach to remote subvolume of server2
 10: volume brick2
 11:  type protocol/client
 12:  option transport-type tcp
13: option remote-host 172.19.45.103 # IP address of the remote brick
 14:  option remote-subvolume brick        # name of the remote volume
 15: end-volume
 16:
 17: volume replicate
 18:  type cluster/replicate
 19:  subvolumes brick1 brick2
 20: end-volume

+------------------------------------------------------------------------------+
[2009-10-01 08:16:15] D [glusterfsd.c:1205:main] glusterfs: running in pid 17884 [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick1: defaulting frame-timeout to 30mins [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick1: defaulting ping-timeout to 10 [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick2: defaulting frame-timeout to 30mins [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick2: defaulting ping-timeout to 10 [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] N [glusterfsd.c:1224:main] glusterfs: Successfully started [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick1: Connected to 172.19.45.102:6996, attached to remote volume 'brick'. [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume 'brick1' came back up; going online. [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick1: Connected to 172.19.45.102:6996, attached to remote volume 'brick'. [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume 'brick1' came back up; going online. [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick2: Connected to 172.19.45.103:6996, attached to remote volume 'brick'. [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick2: Connected to 172.19.45.103:6996, attached to remote volume 'brick'.
[2009-10-01 08:17:24] N [client-protocol.c:6246:notify] brick1: disconnected
[2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: connection to 172.19.45.102:6996 failed (Connection refused) [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: connection to 172.19.45.102:6996 failed (Connection refused)



Any ideas?


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to