Hi,

Based on an internal discussion we had, I am putting forward some points on the proposed changes:


*lookup*: For files in data split-brain (DSB), allow lookup to succeed and return the inode attributes (struct iatt) from the file which has the bigger size.

For files in metadata split-brain (MSB), allow lookup to succeed and use the below resolution:

*Mismatching attribute*         *Resolution:*
time(a_time/m_time/c_time)
        return the one which has newest m_time
uid/ gid return the uid/gid as root:root so that further FOPS will fail due to lack of permission
nlink   return the bigger of the values
file permission (st_mode)       return AND of the file permissions.


         For files in entry split-brain (ESB), lookup has to fail.


Note that if lookup gets called before the other FOPS, then the above is the expected behaviour. If it doesn't (due to caching, or the split brain occurring after lookup happens etc),
then we need to define what happens on each FOP:

*stat*: If file is in split-brain, send stat to all subvolumes,and perform the same steps as done in lookup (i.e. perform same checks as above).

*write*: Allow writes to go through irrespective of the type of split-brain. This is in marked difference with the current behaviour where we disallow writes to DSB files.The rationale is that the write could include a truncate to zero, which is a valid use case for resolving the split-brained file if the user wishes to do so.

*read*: Do not allow reads irrespective of the type of split-brain. This would serve as a indication to user that file is in split-brain.

*get(f)attr*: For DSB, allow it.
            For MSB, Don't allow.

*set(f)attr*: For DSB and MSB, allow it.

*touch (create), hardlink, softlink, rename, chown, chmod, unlink*:Allow the operation for all type of split-brains

Forcing look ups to occur for readdirp:If a directory is in split brain and a *readdirp* is issued, after getting the entries, AFR needs to check them for split-brains and for those entries which are in split-brain,it needs to set the inode to null before unwinding the reply to the parent xlator. What we are essentially doing here is downgrading a readdirp to a readdir, thereby ensuring that a lookup is always triggered if that file is accessed again.

Thanks,
Ravi




On 12/27/2013 04:40 PM, Ravishankar N wrote:



-------- Original Message --------
Subject:        Re: [Gluster-users] Fencing FOPs on data-split-brained files
Date:   Tue, 19 Nov 2013 16:03:14 +0530
From:   Ravishankar N <ravishan...@redhat.com>
To:     Anand Avati <av...@gluster.org>
CC: Gluster Devel <gluster-devel@nongnu.org>, "gluster-us...@gluster.org" <gluster-us...@gluster.org>



On 11/16/2013 01:42 AM, Anand Avati wrote:
Ravi,
We should not mix up data and entry operation domains, if a file is in data split brain that should not stop a user from rename/link/unlink operations on the file.

Regarding your concern about complications while healing - we should change our "manual fixing" instructions to:

- go to backend, access through gfid path or normal path
- rmxattr the afr changelogs
- truncate the file to 0 bytes (like "> filename")

Accessing the path through gfid and truncating to 0 bytes addresses your concerns about hardlinks/renames.

Avati



/Resending the mail again as there was no response
-Ravi
/

All,

I have tabulated what operations must/ mustn't be permitted in case of different split brains. Some of the columns are '?' as I am not sure what the expected behaviour should be. Could we have this validated?


*File Operation permitted*      *Type of Split Brain*
*Data SB*       *Metadata SB*   *Entry SB*
*
*       *
*       *Same entry gfid mismatch SB*   *Different entries*
write   No      Yes (currently no)      No      Yes
read    No      Yes (currently no)      No      Yes
getfattr        Yes     No      No      Yes
lookup  ?       ?       No      Yes
stat/fstat      ?       ?       No      Yes
setfattr        Yes     No      No      Yes
touch   Yes     Yes     No      Yes
hard link creation      Yes     Yes     No      Yes
soft link creation      Yes     Yes     Yes     Yes
rename  Yes     Yes     no      Yes
chown   Yes     Yes     Currently No    Yes
chmod   Yes     Yes     Currently No    Yes
unlink  Yes     Yes     Currently No    Yes
readdir         N/A     N/A     ?       ?


- stat() also reports the file size. If a data split-brained file has different sizes, should stat succeed? - Likewise if metadata split brain is due to different access permissions, say one brick has file chmod'ed with 777 and the other brick has it with 744, should we allow read/write if the corresponding permission bits are *not* conflciting ? ( as of today they aren't allowed)

Also,In the table above, Entry Split brain has 2 cases-
i) where same entry has different gfids
ii) each brick has different entries for the same directory (which can cause deleted files to appear in case of conservative merge).
Should we allow readdir in either case?

Thanks,
Ravi

On Wed, Nov 13, 2013 at 3:01 AM, Ravishankar N <ravishan...@redhat.com <mailto:ravishan...@redhat.com>> wrote:

    Hi,

    Currenly in glusterfs, when there is a data splt-brain (only) on
    a file, we disallow the following operations from the mount-point
    by returning EIO to the application:
    - Writes to the file (truncate, dd, echo, cp etc)
    - Reads to the file (cat)
    - Reading extended attributes (getfattr) [1]

    However we do permit the following operations:
    -creating hardlinks
    -creating symlinks
    -mv
    -setattr
    -chmod
    -chown
    --touch
    -ls
    -stat

    While it makes sense to allow `ls` and `stat`, is it okay to  add
    checks in the FOPS to disallow the other operations? Allowing
    creation of links and changing file attributes only seems to
    complicate things before the admin can go to the backend bricks
    and resolve the splitbrain (by deleteing all but the healthy copy
    of the file including hardlinks). More so if the file is renamed
    before addressing the split-brain.
    Please share your thoughs.

    Thanks,
    Ravi

    [1] http://review.gluster.org/#/c/5988/
    _______________________________________________
    Gluster-users mailing list
    gluster-us...@gluster.org <mailto:gluster-us...@gluster.org>
    http://supercolony.gluster.org/mailman/listinfo/gluster-users






_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Reply via email to