Ok, using glusterfs-volgen I rebuilt the config files and got the cluster
working again.
The performance is improved from 1.6 to about 24, NFS is at about 11, and
straight to the disks is 170+.
There seems to be a HUGE performance loss by using a network file system.
I must still be doing something wrong, because I have several problems:
1. If I create a file on the client while 1 of the servers is down, when the
server comes back up the file is still missing.
I thought the servers were supposed to be replicated, why don't they
re-sync?
2. If server 1 is down and 2 is up, it takes about 30 seconds to do a df on the client, that is WAY to long, lots of my applications will timeout in under that
time.
How do I get the client to work the same if 1 of the servers is down?
3. On the server if the network is down and I try to do a df it hangs my shell (the server has the glusterfs mounted as a client) This is typical behavior for
NFS also, is there any way to timeout instead of hanging?
Here are my new config files:
----- server.vol -----
volume tcb_posix
type storage/posix
option directory /mnt/tcb_data
end-volume
volume tcb_locks
type features/locks
subvolumes tcb_posix
end-volume
volume tcb_brick
type performance/io-threads
option thread-count 8
subvolumes tcb_locks
end-volume
volume tcb_server
type protocol/server
option transport-type tcp
option auth.addr.tcb_brick.allow *
option transport.socket.listen-port 50001
option transport.socket.nodelay on
subvolumes tcb_brick
end-volume
----------------------------------------
----- client.vol -----
volume tcb_remote_glust1
type protocol/client
option transport-type tcp
option remote-host x.x.x.x
option transport.socket.nodelay on
option transport.remote-port 50001
option remote-subvolume tcb_brick
end-volume
volume tcb_remote_glust2
type protocol/client
option transport-type tcp
option remote-host y.y.y.y
option transport.socket.nodelay on
option transport.remote-port 50001
option remote-subvolume tcb_brick
end-volume
volume tcb_mirror
type cluster/replicate
subvolumes tcb_remote_glust1 tcb_remote_glust2
end-volume
volume tcb_writebehind
type performance/write-behind
option cache-size 4MB
subvolumes tcb_mirror
end-volume
volume tcb_readahead
type performance/read-ahead
option page-count 4
subvolumes tcb_writebehind
end-volume
volume tcb_iocache
type performance/io-cache
option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 /
1024}' | cut -f1 -d.`MB
option cache-timeout 1
subvolumes tcb_readahead
end-volume
volume tcb_quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes tcb_iocache
end-volume
volume tcb_statprefetch
type performance/stat-prefetch
subvolumes tcb_quickread
end-volume
^C
Chad wrote:
Ok, I tried to change over to this, but now I just get:
[2010-02-24 09:30:41] E [authenticate.c:234:gf_authenticate] auth: no
authentication module is interested in accepting remote-client
10.0.0.24:1007
[2010-02-24 09:30:41] E [server-protocol.c:5822:mop_setvolume]
tcb_remote: Cannot authenticate client from 10.0.0.24:1007
I am sure it is something simple, I just don't know what.
Is this a port problem? the port in the log is 1007, but the server is
on 50001.
Here are my config files:
----- server.vol: -----
volume tcb_posix-export
type storage/posix
option directory /mnt/tcb_data
end-volume
volume tcb_locks-export
type features/locks
subvolumes tcb_posix-export
end-volume
volume tcb_export
type performance/io-threads
option thread-count 8
subvolumes tcb_locks-export
end-volume
volume tcb_remote
type protocol/server
option transport-type tcp
option transport.socket.listen-port 50001
option transport.socket.nodelay on
subvolumes tcb_export tcb_locks-export
option auth.ip.tcb_locks-export.allow
10.0.0.*,10.0.20.*,10.0.30.*,192.168.1.*,192.168.20.*,192.168.30.*,127.0.0.1
option auth.ip.tcb_export.allow
10.0.0.*,10.0.20.*,10.0.30.*,192.168.1.*,192.168.20.*,192.168.30.*,127.0.0.1
end-volume
----- client.vol -----
volume tcb_remote1
type protocol/client
option transport-type tcp
option remote-port 50001
option remote-host 10.0.0.24
option remote-subvolume tcb_remote
end-volume
volume tcb_remote2
type protocol/client
option transport-type tcp
option remote-port 50001
option remote-host 10.0.0.25
option remote-subvolume tcb_remote
end-volume
volume tcb_mirror
type cluster/afr
subvolumes tcb_remote1 tcb_remote2
end-volume
volume tcb_wb
type performance/write-behind
option cache-size 1MB
subvolumes tcb_mirror
end-volume
volume tcb_ioc
type performance/io-cache
option cache-size 32MB
subvolumes tcb_wb
end-volume
volume tcb_iothreads
type performance/io-threads
option thread-count 16
subvolumes tcb_ioc
end-volume
^C
Chad wrote:
I finally got the servers transported 2000 miles, set-up, wired, and
booted.
Here are the vol files.
Just to reiterate, the issues are slow performance on read/write, and
clients hanging when 1 server goes down.
### glusterfs.vol ###
############################################
# Start tcb_cluster
############################################
# the exported volume to mount # required!
volume tcb_cluster
type protocol/client
option transport-type tcp/client
option remote-host glustcluster
option remote-port 50001
option remote-subvolume tcb_cluster
end-volume
############################################
# Start cs_cluster
############################################
# the exported volume to mount # required!
volume cs_cluster
type protocol/client
option transport-type tcp/client
option remote-host glustcluster
option remote-port 50002
option remote-subvolume cs_cluster
end-volume
############################################
# Start pbx_cluster
############################################
# the exported volume to mount # required!
volume pbx_cluster
type protocol/client
option transport-type tcp/client
option remote-host glustcluster
option remote-port 50003
option remote-subvolume pbx_cluster
end-volume
---------------------------------------------------
### glusterfsd.vol ###
#############################################
# Start tcb_data cluster
#############################################
volume tcb_local
type storage/posix
option directory /mnt/tcb_data
end-volume
volume tcb_locks
type features/locks
option mandatory-locks on # enables mandatory locking
on all files
subvolumes tcb_local
end-volume
# dataspace on remote machine, look in /etc/hosts to see that
volume tcb_locks_remote
type protocol/client
option transport-type tcp
option remote-port 50001
option remote-host 192.168.1.25
option remote-subvolume tcb_locks
end-volume
# automatic file replication translator for dataspace
volume tcb_cluster_afr
type cluster/replicate
subvolumes tcb_locks tcb_locks_remote
end-volume
# the actual exported volume
volume tcb_cluster
type performance/io-threads
option thread-count 256
option cache-size 128MB
subvolumes tcb_cluster_afr
end-volume
volume tcb_cluster_server
type protocol/server
option transport-type tcp
option transport.socket.listen-port 50001
option auth.addr.tcb_locks.allow *
option auth.addr.tcb_cluster.allow *
option transport.socket.nodelay on
subvolumes tcb_cluster
end-volume
#############################################
# Start cs_data cluster
#############################################
volume cs_local
type storage/posix
option directory /mnt/cs_data
end-volume
volume cs_locks
type features/locks
option mandatory-locks on # enables mandatory locking
on all files
subvolumes cs_local
end-volume
# dataspace on remote machine, look in /etc/hosts to see that
volume cs_locks_remote
type protocol/client
option transport-type tcp
option remote-port 50002
option remote-host 192.168.1.25
option remote-subvolume cs_locks
end-volume
# automatic file replication translator for dataspace
volume cs_cluster_afr
type cluster/replicate
subvolumes cs_locks cs_locks_remote
end-volume
# the actual exported volume
volume cs_cluster
type performance/io-threads
option thread-count 256
option cache-size 128MB
subvolumes cs_cluster_afr
end-volume
volume cs_cluster_server
type protocol/server
option transport-type tcp
option transport.socket.listen-port 50002
option auth.addr.cs_locks.allow *
option auth.addr.cs_cluster.allow *
option transport.socket.nodelay on
subvolumes cs_cluster
end-volume
#############################################
# Start pbx_data cluster
#############################################
volume pbx_local
type storage/posix
option directory /mnt/pbx_data
end-volume
volume pbx_locks
type features/locks
option mandatory-locks on # enables mandatory locking
on all files
subvolumes pbx_local
end-volume
# dataspace on remote machine, look in /etc/hosts to see that
volume pbx_locks_remote
type protocol/client
option transport-type tcp
option remote-port 50003
option remote-host 192.168.1.25
option remote-subvolume pbx_locks
end-volume
# automatic file replication translator for dataspace
volume pbx_cluster_afr
type cluster/replicate
subvolumes pbx_locks pbx_locks_remote
end-volume
# the actual exported volume
volume pbx_cluster
type performance/io-threads
option thread-count 256
option cache-size 128MB
subvolumes pbx_cluster_afr
end-volume
volume pbx_cluster_server
type protocol/server
option transport-type tcp
option transport.socket.listen-port 50003
option auth.addr.pbx_locks.allow *
option auth.addr.pbx_cluster.allow *
option transport.socket.nodelay on
subvolumes pbx_cluster
end-volume
--
^C
Smart Weblications GmbH - Florian Wiessner wrote:
Hi,
Am 16.02.2010 01:58, schrieb Chad:
I am new to glusterfs, and this list, please let me know if I have made
any mistakes in posting this to the list.
I am not sure what your standards are.
I came across glusterfs last week, it was super easy to set-up and test
and is almost exactly what I want/need.
I set up 2 "glusterfs servers" that serve up a mirrored raid5 disk
partitioned into 3 5oogb partitions to 6 clients.
I am using round robin DNS, but I also tried to use heartbeat and
ldirectord (see details below).
Each server has 2 NICs: 1 for the clients, the other has a cross over
cable connecting the 2 servers. Both NICs are 1000mbps.
There are only 2 issues.
#1. When one of the servers goes down the clients hang at least for a
little while (more testing is needed) I am not sure if the clients can
recover at all.
#2. The read/write tests I performed came in at 1.6 when using
glusterfs, NFS on all the same machines came in at 11, and a direct
test
on the data server came
in at 111. How do I improve the performance?
please share your vol-files. i don't understand why you would need
loadbalancers.
###############################################
My glusterfs set-up:
2 supermicro dual Xeon 3.0 ghz CPUs, 8gb ram, 4 @ 750gb seagate sata
HDs, 3 in raid5 with 1 hot spare. (data servers)
why not use raid10? same capacity, better speed..
6 supermicro dual AMD 2.8 ghz CPUs, 4gb ram, 2 @ 250gb seagate sata HDs
in raid 1. (client machines)
glusterfs is set-up with round robin DNS to handle the load
balancing of
the 2 data servers.
afaik there is no need to setup dns rr nor loadbalancing for the gluster
servers, glusterfs should take care of that itself. but without your
volfiles i
can't give any hints.
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users