This has been tested using the most recent build of 3.1.2 (built Jan 18 2011 
11:19:54)
System setup:
Volume Name: brick
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: linguest2:/data/exp
Brick2: linguest3:/data/exp
Brick3: linguest4:/data/exp
Brick4: linguest5:/data/exp
If you have a split brain situation and file1.txt with contents "content from 
split 1" is copied to the left side split and file1.txt with contents "contents 
from split 2" is copied to the right side split then the split is recovered the 
files are left on the machines that they were copied to. (which is fine as 
gluster have already said that the new version 3.1.2 does not cope with split 
brain anymore).  But if you go and read the file on either of the machines you 
get the log:
[2011-02-09 09:36:15.432679] I 
[afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] brick-replicate-0: 
background  data self-heal completed on /file1.txt

This log continues every time you access the file.
Then to try and fix it i changed the file1.txt and copied that file to the 
machine that would have been in the left side split, my expectations are that 
this would just replicate out to all the machines and override the file.
But all that happened was that the file1.txt was on the machine and not 
replicated out, also the date of the file had been changed to 1970-01-01 ?
I have also run a rebalance which did nothing to fix this issue.
I now have two machines that are inconsistent and cannot see how to fix this or 
how I would get a monitoring system to monitor this because there are no errors 
in the log as "data self-heal completed" can happen on files in different 
scenarios.

Thanks
Nick
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to