That seems correct with 1 change, not only do I get the old file in step 5, 
that old file overwrites the newer file on the node that did not go down.

> 1) What versions are you using ?
glusterfs 3.0.2 built on Feb  7 2010 00:15:44
Repository revision: v3.0.2

> 2) Can you share your volume files ? Are they generated using volgen ?
I did generate them via volgen, but then modified them because I have 3 shares, 
but only to rename things.
(vol files at end of e-mail)

> 3) Did you notice any patterns for the files where the wrong copy was picked 
? like
> were they open when the node was brought down ?
I was not monitoring this.

> 4) Any other way to reproduce the problem ?
See my nfs issue below, although I don't think they are related.

> 5) Any other patterns you observed when you see the problem ?
See my nfs issue below, although I don't think they are related.

> 6) Would you have listings of problem file(s) from the replica nodes ?
No.

Also I did something today that works on nfs but does not work in gluster.
I have a share mounted on /cs_data.
I have directories in that share /cs_data/web and /cs_data/home
I move the /cs_data/web into /cs_data/home (so I get: /cs_data/home/web) then 
symlink /cs_data/web to /cs_data/home/web, like this:
cd /cs_data;
mv web home;
ln -s home/web

On all the clients /cs_data/web does not work anymore.
If I unmount and remount it works again.
Unfortunately for the unmount/mount to work I have to kill things like httpd.
So to do a simple dir move (because I had it in the wrong place) on a read-only 
dir, I have to kill my service.

I have done exactly this with an nfs mount and it did not fail at all, I did 
not have to kill httpd and I did not have to unmount/remount the share.

------------------
--- server.vol ---
------------------
# $ /usr/bin/glusterfs-volgen -n tcb_data -p 50001 -r 1 -c /etc/glusterfs 
10.0.0.24:/mnt/tcb_data 10.0.0.25:/mnt/tcb_data

######################################
# Start tcb share
######################################
volume tcb_posix
  type storage/posix
  option directory /mnt/tcb_data
end-volume

volume tcb_locks
    type features/locks
    subvolumes tcb_posix
end-volume

volume tcb_brick
    type performance/io-threads
    option thread-count 8
    subvolumes tcb_locks
end-volume

volume tcb_server
    type protocol/server
    option transport-type tcp
    option auth.addr.tcb_brick.allow *
    option transport.socket.listen-port 50001
    option transport.socket.nodelay on
    subvolumes tcb_brick
end-volume

------------------
--- tcb client.vol ---
------------------
volume tcb_remote_glust1
        type protocol/client
        option transport-type tcp
        option ping-timeout 5
        option remote-host 10.0.0.24
        option transport.socket.nodelay on
        option transport.remote-port 50001
        option remote-subvolume tcb_brick
end-volume

volume tcb_remote_glust2
        type protocol/client
        option transport-type tcp
        option ping-timeout 5
        option remote-host 10.0.0.25
        option transport.socket.nodelay on
        option transport.remote-port 50001
        option remote-subvolume tcb_brick
end-volume

volume tcb_mirror
        type cluster/replicate
        subvolumes tcb_remote_glust1 tcb_remote_glust2
end-volume

volume tcb_writebehind
        type performance/write-behind
        option cache-size 4MB
        subvolumes tcb_mirror
end-volume

volume tcb_readahead
        type performance/read-ahead
        option page-count 4
        subvolumes tcb_writebehind
end-volume

volume tcb_iocache
        type performance/io-cache
        option cache-size `grep 'MemTotal' /proc/meminfo  | awk '{print $2 * 
0.2 / 1024}' | cut -f1 -d.`MB
        option cache-timeout 1
        subvolumes tcb_readahead
end-volume

volume tcb_quickread
        type performance/quick-read
        option cache-timeout 1
        option max-file-size 64kB
        subvolumes tcb_iocache
end-volume

volume tcb_statprefetch
        type performance/stat-prefetch
        subvolumes tcb_quickread
end-volume

^C



Tejas N. Bhise wrote:
Chad, Stephan - thank you for your feedback.

Just to clarify on what wrote, do you mean to say that -

1) The setup is a replicate setup with the file being written to multiple nodes.
2) One of these nodes is brought down.
3) A replicated file with a copy on the node brought down is written to.
4) The other copies are updates as writes  happen while this node is still down.
5) After this node is brought up, the client sometimes sees the old file on the 
node brought up
instead of picking the file from a node that has the latest copy.

If the above is correct, quick questions -

1) What versions are you using ?
2) Can you share your volume files ? Are they generated using volgen ? 3) Did you notice any patterns for the files where the wrong copy was picked ? like were they open when the node was brought down ?
4) Any other way to reproduce the problem ?
5) Any other patterns you observed when you see the problem ?
6) Would you have listings of problem file(s) from the replica nodes ?

If however my understanding was not  correct, then please let me know with some
examples.

Regards,
Tejas.

----- Original Message -----
From: "Chad" <[email protected]>
To: "Stephan von Krawczynski" <[email protected]>
Cc: [email protected]
Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai, New 
Delhi
Subject: Re: [Gluster-users] How to re-sync

I actually do prefer top post.

Well this "overwritten" behavior is what I saw as well and that is a REALLY 
REALLY bad thing.
Which is why I asked my question in the first place.

Is there a gluster developer out there working on this problem specifically?
Could we add some kind of "sync done" command that has to be run manually and 
until it is the failed node is not used?
The bottom line for me is that I would much rather run on a performance 
degraded array until a sysadmin intervenes, than loose any data.

^C



Stephan von Krawczynski wrote:
I love top-post ;-)

Generally, you are right. But in real-life you cannot trust on this
"smartness". We tried exactly this point and had to find out that the clients
do not always select the correct file version (i.e. the latest) automatically.
Our idea in the testcase was to bring down a node, update its kernel an revive
it - just as you would like to do it in real world for a kernel update.
We found out that some files were taken from the downed node afterwards and
the new contents on the other node got in fact overwritten.
This does not happen generally, of course. But it does happen. We could only
stop this behaviour by setting "favorite-child". But that does not really help
a lot, since we want to take down all nodes some other day.
This is in fact one of our show-stoppers.


On Sun, 7 Mar 2010 01:33:14 -0800
Liam Slusser <[email protected]> wrote:

Assuming you used raid1 (distribute), you DO bring up the new machine
and start gluster.  On one of your gluster mounts you run a ls -alR
and it will resync the new node.  The gluster clients are smart enough
to get the files from the first node.

liam

On Sat, Mar 6, 2010 at 11:48 PM, Chad <[email protected]> wrote:
Ok, so assuming you have N glusterfsd servers (say 2 cause it does not
really matter).
Now one of the servers dies.
You repair the machine and bring it back up.

I think 2 things:
1. You should not start glusterfsd on boot (you need to sync the HD first)
2. When it is up how do you re-sync it?

Do you rsync the underlying mount points?
If it is a busy gluster cluster it will be getting new files all the time.
So how do you sync and bring it back up safely so that clients don't connect
to an incomplete server?

^C
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to