Hi, As I previously described, my root file system is located in RAM so I'll lose the gluster volume definition(s) in case of a reboot. However, I would like to backup the required files to a mounted disk so that they can be restored to /etc after the reboot. Which files would I have to backup/restore to be able to run 'gluster volume start' without first re-creating the volume?
Regards Andreas On 11/05/14 12:23, Ravishankar N wrote: > On 11/05/2014 03:18 PM, Andreas Hollaus wrote: >> Hi, >> >> I'm curious about this 5 phase transaction scheme that is described in the >> document >> (lock, pre-op, op, post-op, unlock). >> Are these stage switches all triggered from the client or can the server do >> it >> without notifying the client, for instance switching from 'op' to 'post-op'? > > All stages are performed by the AFR translator in the client graph, where it > is > loaded, in the sequence you listed. >> Decreasing the counter for the local pending operations could be done >> without talking >> to the client, even though I realize a message has to sent to the other >> server(s), >> possibly through the client. >> >> The reason I ask is that I'm trying to estimate the risk of ending up in a >> split >> brain situation, or at least understand if our servers will 'accuse' each >> other >> temporarily during this 5 phase transaction under normal circumstances. If I >> understand who sends messages to who and I what order, I'll have a better >> chance to >> see if we require any solution to split brain situations. As I've experienced >> problems to setup the 'favorite-child' option, I want to know if it's >> required or >> not. In our use case, quorum is not a solution, but losing some data is >> acceptable as >> long as the bricks are in sync. > If a file is split-brained, AFR does not allow modifications by clients on it > until the split-brain is resolved. The afr xattrs and heal mechanisms ensure > that > the bricks are in sync, so worries on that front. > Thanks, > Ravi >> >> Regards >> Andreas >> >> On 10/31/14 15:37, Ravishankar N wrote: >>> On 10/30/2014 07:23 PM, Andreas Hollaus wrote: >>>> Hi, >>>> >>>> Thanks! Seems like an interesting document. Although I've read blogs about >>>> how >>>> extended attributes are used as a change log, this seams like a more >>>> comprehensive >>>> document. >>>> >>>> I won't write directly to any brick. That's the reason I first have to >>>> create a >>>> volume which consists of only one brick, until the other server is >>>> available, and >>>> then add that second brick. I don't want to delay the file system clients >>>> until the >>>> second server is available, hence the reason for add-brick. >>>> >>>> I guess that this procedure is only needed the first time the volume is >>>> configured, >>>> right? If any of these bricks would fail later on, the change log would >>>> keep >>>> track of >>>> all changes to the file system even though only one of the bricks is >>>> available(?). >>> Yes, if one one brick of a replica pair goes down, the other one keeps >>> track of >>> file modifications by the client, and would sync it back to the first one >>> when it >>> comes back up. >>> >>>> After a restart, volume settings stored in the configuration file would be >>>> accepted >>>> even though not all servers were up and running yet at that time, wouldn't >>>> they? >>> glusterd running on all nodes ensures that the volume configurations stored >>> on each >>> node are in sync. >>>> Speaking about configuration files. When are these copied to each server? >>>> If I create a volume which consists of two bricks, I guess that those >>>> servers will >>>> create the configuration files, independently of each other, from the >>>> information >>>> sent from the client (gluster volume create...). >>> All volume config/management commands must be run from any of the servers >>> that make >>> up the volume and not the client (unless both happen to be in the same >>> machine). As >>> mentioned above, when any of the volume commands are run on any one server, >>> glusterd orchestrates the necessary action on all servers and keeps them in >>> sync. >>>> In case I later on add a brick, I guess that the settings have to be >>>> copied >>>> to the >>>> new brick after they have been modified on the first one, right (or will >>>> they be >>>> recreated on all servers from the information specified by the client, >>>> like in the >>>> previous case)? >>>> >>>> Will configuration files be copied in other situations as well, for >>>> instance in >>>> case >>>> one of the servers which is part of the volume for some reason would be >>>> missing >>>> those >>>> files? In my case, the root file system is recreated from an image at each >>>> reboot, so >>>> everything created in /etc will be lost. Will GlusterFS settings be >>>> restored >>>> from the >>>> other server automatically >>> No, it is expected that servers have persistent file-systems. There are >>> ways to >>> restore such bricks; see >>> http://gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server >>> >>> >>> -Ravi >>>> or do I need to backup and restore those myself? Even >>>> though the brick doesn't know that it is part of a volume in case it lose >>>> the >>>> configuration files, both the other server(s) and the client(s) will >>>> probably >>>> recognize it as being part of the volume. I therefore believe that such a >>>> self-healing would actually be possible, even though it may not be >>>> implemented. >>>> >>>> >>>> Regards >>>> Andreas >>>> On 10/30/14 05:21, Ravishankar N wrote: >>>>> On 10/28/2014 03:58 PM, Andreas Hollaus wrote: >>>>>> Hi, >>>>>> >>>>>> I'm curious about how GlusterFS manages to sync the bricks in the >>>>>> initial phase, >>>>>> when >>>>>> the volume is created or >>>>>> extended. >>>>>> >>>>>> I first create a volume consisting of only one brick, which clients will >>>>>> start to >>>>>> read and write. >>>>>> After a while I add a second brick to the volume to create a replicated >>>>>> volume. >>>>>> >>>>>> If this new brick is empty, I guess that files will be copied from the >>>>>> first >>>>>> brick to >>>>>> get the bricks in sync, right? >>>>>> >>>>>> However, if the second brick is not empty but rather contains a subset >>>>>> of the >>>>>> files >>>>>> on the first brick I don't see >>>>>> how GlusterFS will solve the problem of syncing the bricks. >>>>>> >>>>>> I guess that all files which lack extended attributes could be removed >>>>>> in this >>>>>> scenario, because they were created >>>>>> when the disk was not part of a GlusterFS volume. However, in case the >>>>>> brick was >>>>>> used >>>>>> in the volume previously, >>>>>> for instance before that server restarted, there will be extended >>>>>> attributes for >>>>>> the >>>>>> files on the second brick which >>>>>> weren't updated during the downtime (when the volume consisted of only >>>>>> one >>>>>> brick). >>>>>> There could be multiple >>>>>> changes to the files during this time. In this case I don't understand >>>>>> how the >>>>>> extended attributes could be used to >>>>>> determine which of the bricks contains the most recent file. >>>>>> >>>>>> Can anyone explain how this works? Is it only allowed to add empty >>>>>> bricks to a >>>>>> volume? >>>>>> >>>>>> >>>>> It is allowed to add only empty bricks to the volume. Writing directly to >>>>> bricks is >>>>> not supported. One needs to access the volume only from a mount point or >>>>> using >>>>> libgfapi. >>>>> After adding a brick to increase the distribute count, you need to run >>>>> the volume >>>>> rebalance command so that the some of the existing files are hashed >>>>> (moved) to >>>>> this >>>>> newly added brick. >>>>> After adding a brick to increase the replica count, you need to run the >>>>> volume >>>>> heal >>>>> full command to sync the files from the other replica into the newly >>>>> added brick. >>>>> https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md >>>>> will give >>>>> you an idea of how the replicate translator uses xattrs to keep files in >>>>> sync. >>>>> >>>>> HTH, >>>>> Ravi >> > _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
