That seems correct with 1 change, not only do I get the old file in step 5,
that old file overwrites the newer file on the node that did not go down.
> 1) What versions are you using ?
glusterfs 3.0.2 built on Feb 7 2010 00:15:44
Repository revision: v3.0.2
> 2) Can you share your volume files ? Are they generated using volgen ?
I did generate them via volgen, but then modified them because I have 3 shares,
but only to rename things.
(vol files at end of e-mail)
> 3) Did you notice any patterns for the files where the wrong copy was picked
? like
> were they open when the node was brought down ?
I was not monitoring this.
> 4) Any other way to reproduce the problem ?
See my nfs issue below, although I don't think they are related.
> 5) Any other patterns you observed when you see the problem ?
See my nfs issue below, although I don't think they are related.
> 6) Would you have listings of problem file(s) from the replica nodes ?
No.
Also I did something today that works on nfs but does not work in gluster.
I have a share mounted on /cs_data.
I have directories in that share /cs_data/web and /cs_data/home
I move the /cs_data/web into /cs_data/home (so I get: /cs_data/home/web) then
symlink /cs_data/web to /cs_data/home/web, like this:
cd /cs_data;
mv web home;
ln -s home/web
On all the clients /cs_data/web does not work anymore.
If I unmount and remount it works again.
Unfortunately for the unmount/mount to work I have to kill things like httpd.
So to do a simple dir move (because I had it in the wrong place) on a read-only
dir, I have to kill my service.
I have done exactly this with an nfs mount and it did not fail at all, I did
not have to kill httpd and I did not have to unmount/remount the share.
------------------
--- server.vol ---
------------------
# $ /usr/bin/glusterfs-volgen -n tcb_data -p 50001 -r 1 -c /etc/glusterfs
10.0.0.24:/mnt/tcb_data 10.0.0.25:/mnt/tcb_data
######################################
# Start tcb share
######################################
volume tcb_posix
type storage/posix
option directory /mnt/tcb_data
end-volume
volume tcb_locks
type features/locks
subvolumes tcb_posix
end-volume
volume tcb_brick
type performance/io-threads
option thread-count 8
subvolumes tcb_locks
end-volume
volume tcb_server
type protocol/server
option transport-type tcp
option auth.addr.tcb_brick.allow *
option transport.socket.listen-port 50001
option transport.socket.nodelay on
subvolumes tcb_brick
end-volume
------------------
--- tcb client.vol ---
------------------
volume tcb_remote_glust1
type protocol/client
option transport-type tcp
option ping-timeout 5
option remote-host 10.0.0.24
option transport.socket.nodelay on
option transport.remote-port 50001
option remote-subvolume tcb_brick
end-volume
volume tcb_remote_glust2
type protocol/client
option transport-type tcp
option ping-timeout 5
option remote-host 10.0.0.25
option transport.socket.nodelay on
option transport.remote-port 50001
option remote-subvolume tcb_brick
end-volume
volume tcb_mirror
type cluster/replicate
subvolumes tcb_remote_glust1 tcb_remote_glust2
end-volume
volume tcb_writebehind
type performance/write-behind
option cache-size 4MB
subvolumes tcb_mirror
end-volume
volume tcb_readahead
type performance/read-ahead
option page-count 4
subvolumes tcb_writebehind
end-volume
volume tcb_iocache
type performance/io-cache
option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 *
0.2 / 1024}' | cut -f1 -d.`MB
option cache-timeout 1
subvolumes tcb_readahead
end-volume
volume tcb_quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes tcb_iocache
end-volume
volume tcb_statprefetch
type performance/stat-prefetch
subvolumes tcb_quickread
end-volume
^C
Tejas N. Bhise wrote:
Chad, Stephan - thank you for your feedback.
Just to clarify on what wrote, do you mean to say that -
1) The setup is a replicate setup with the file being written to multiple nodes.
2) One of these nodes is brought down.
3) A replicated file with a copy on the node brought down is written to.
4) The other copies are updates as writes happen while this node is still down.
5) After this node is brought up, the client sometimes sees the old file on the
node brought up
instead of picking the file from a node that has the latest copy.
If the above is correct, quick questions -
1) What versions are you using ?
2) Can you share your volume files ? Are they generated using volgen ?
3) Did you notice any patterns for the files where the wrong copy was picked ? like
were they open when the node was brought down ?
4) Any other way to reproduce the problem ?
5) Any other patterns you observed when you see the problem ?
6) Would you have listings of problem file(s) from the replica nodes ?
If however my understanding was not correct, then please let me know with some
examples.
Regards,
Tejas.
----- Original Message -----
From: "Chad" <[email protected]>
To: "Stephan von Krawczynski" <[email protected]>
Cc: [email protected]
Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai, New
Delhi
Subject: Re: [Gluster-users] How to re-sync
I actually do prefer top post.
Well this "overwritten" behavior is what I saw as well and that is a REALLY
REALLY bad thing.
Which is why I asked my question in the first place.
Is there a gluster developer out there working on this problem specifically?
Could we add some kind of "sync done" command that has to be run manually and
until it is the failed node is not used?
The bottom line for me is that I would much rather run on a performance
degraded array until a sysadmin intervenes, than loose any data.
^C
Stephan von Krawczynski wrote:
I love top-post ;-)
Generally, you are right. But in real-life you cannot trust on this
"smartness". We tried exactly this point and had to find out that the clients
do not always select the correct file version (i.e. the latest) automatically.
Our idea in the testcase was to bring down a node, update its kernel an revive
it - just as you would like to do it in real world for a kernel update.
We found out that some files were taken from the downed node afterwards and
the new contents on the other node got in fact overwritten.
This does not happen generally, of course. But it does happen. We could only
stop this behaviour by setting "favorite-child". But that does not really help
a lot, since we want to take down all nodes some other day.
This is in fact one of our show-stoppers.
On Sun, 7 Mar 2010 01:33:14 -0800
Liam Slusser <[email protected]> wrote:
Assuming you used raid1 (distribute), you DO bring up the new machine
and start gluster. On one of your gluster mounts you run a ls -alR
and it will resync the new node. The gluster clients are smart enough
to get the files from the first node.
liam
On Sat, Mar 6, 2010 at 11:48 PM, Chad <[email protected]> wrote:
Ok, so assuming you have N glusterfsd servers (say 2 cause it does not
really matter).
Now one of the servers dies.
You repair the machine and bring it back up.
I think 2 things:
1. You should not start glusterfsd on boot (you need to sync the HD first)
2. When it is up how do you re-sync it?
Do you rsync the underlying mount points?
If it is a busy gluster cluster it will be getting new files all the time.
So how do you sync and bring it back up safely so that clients don't connect
to an incomplete server?
^C
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users