Re: [Gluster-users] Initial sync

Andreas Hollaus Wed, 12 Nov 2014 02:33:07 -0800

Hi,

As I previously described, my root file system is located in RAM so I'll lose 
the
gluster volume definition(s) in case of a reboot. However, I would like to 
backup the
required files to a mounted disk so that they can be restored to /etc after the
reboot. Which files would I have to backup/restore to be able to run 'gluster 
volume
start'  without first re-creating the volume?


Regards
Andreas

On 11/05/14 12:23, Ravishankar N wrote:
> On 11/05/2014 03:18 PM, Andreas Hollaus wrote:
>> Hi,
>>
>> I'm curious about this 5 phase transaction scheme that is described in the 
>> document
>> (lock, pre-op, op, post-op, unlock).
>> Are these stage switches all triggered from the client or can the server do 
>> it
>> without notifying the client, for instance switching from 'op' to 'post-op'?
>
> All stages are performed by the AFR translator in the client graph, where it 
> is
> loaded, in the sequence you listed.
>> Decreasing the counter for the local pending operations could be done 
>> without talking
>> to the client, even though I realize a message has to sent to the other 
>> server(s),
>> possibly through the client.
>>
>> The reason I ask is that I'm trying to estimate the risk of ending up in a 
>> split
>> brain situation, or at least understand if our servers will 'accuse' each 
>> other
>> temporarily during this 5 phase transaction under normal circumstances. If I
>> understand who sends messages to who and I what order, I'll have a better 
>> chance to
>> see if we require any solution to split brain situations. As I've experienced
>> problems to setup the 'favorite-child' option, I want to know if it's 
>> required or
>> not. In our use case, quorum is not a solution, but losing some data is 
>> acceptable as
>> long as the bricks are in sync.
> If a file is split-brained, AFR does not allow modifications  by clients on it
> until the split-brain is resolved. The afr xattrs and heal mechanisms ensure 
> that
> the bricks are in sync, so worries on that front.
> Thanks,
> Ravi
>>
>> Regards
>> Andreas
>>
>> On 10/31/14 15:37, Ravishankar N wrote:
>>> On 10/30/2014 07:23 PM, Andreas Hollaus wrote:
>>>> Hi,
>>>>
>>>> Thanks! Seems like an interesting document. Although I've read blogs about 
>>>> how
>>>> extended attributes are used as a change log, this seams like a more 
>>>> comprehensive
>>>> document.
>>>>
>>>> I won't write directly to any brick. That's the reason I first have to 
>>>> create a
>>>> volume which consists of only one brick, until the other server is 
>>>> available, and
>>>> then add that second brick. I don't want to delay the file system clients 
>>>> until the
>>>> second server is available, hence the reason for add-brick.
>>>>
>>>> I guess that this procedure is only needed the first time the volume is 
>>>> configured,
>>>> right? If any of these bricks would fail later on, the change log would 
>>>> keep
>>>> track of
>>>> all changes to the file system even though only one of the bricks is 
>>>> available(?).
>>> Yes, if one one brick of a replica pair goes down, the other one keeps 
>>> track of
>>> file modifications by the client, and would sync it back to the first one 
>>> when it
>>> comes back up.
>>>
>>>> After a restart, volume settings stored in the configuration file would be 
>>>> accepted
>>>> even though not all servers were up and running yet at that time, wouldn't 
>>>> they?
>>> glusterd running on all nodes ensures that the volume configurations stored 
>>> on each
>>> node are in sync.
>>>> Speaking about configuration files. When are these copied to each server?
>>>> If I create a volume which consists of two bricks, I guess that those 
>>>> servers will
>>>> create the configuration files, independently of each other, from the 
>>>> information
>>>> sent from the client (gluster volume create...).
>>> All volume config/management commands must be run from any of the servers 
>>> that make
>>> up the volume and not the client (unless both happen to be in the same 
>>> machine). As
>>> mentioned above, when any of the volume commands are run on any one server,
>>> glusterd orchestrates the necessary action on all servers and keeps them in 
>>> sync.
>>>>    In case I later on add a brick, I guess that the settings have to be 
>>>> copied
>>>> to the
>>>> new brick after they have been modified on the first one, right (or will 
>>>> they be
>>>> recreated on all servers from the information specified by the client, 
>>>> like in the
>>>> previous case)?
>>>>
>>>> Will configuration files be copied in other situations as well, for 
>>>> instance in
>>>> case
>>>> one of the servers which is part of the volume for some reason would be 
>>>> missing
>>>> those
>>>> files? In my case, the root file system is recreated from an image at each
>>>> reboot, so
>>>> everything created in /etc will be lost. Will GlusterFS settings be 
>>>> restored
>>>> from the
>>>> other server automatically
>>> No, it is expected that servers have persistent file-systems.  There are 
>>> ways to
>>> restore such bricks; see
>>> http://gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server
>>>
>>>
>>> -Ravi
>>>> or do I need to backup and restore those myself? Even
>>>> though the brick doesn't know that it is part of a volume in case it lose 
>>>> the
>>>> configuration files, both the other server(s) and the client(s) will 
>>>> probably
>>>> recognize it as being part of the volume. I therefore believe that such a
>>>> self-healing would actually be possible, even though it may not be 
>>>> implemented.
>>>>
>>>>
>>>> Regards
>>>> Andreas
>>>>   On 10/30/14 05:21, Ravishankar N wrote:
>>>>> On 10/28/2014 03:58 PM, Andreas Hollaus wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm curious about how GlusterFS manages to sync the bricks in the 
>>>>>> initial phase,
>>>>>> when
>>>>>> the volume is created or
>>>>>> extended.
>>>>>>
>>>>>> I first create a volume consisting of only one brick, which clients will 
>>>>>> start to
>>>>>> read and write.
>>>>>> After a while I add a second brick to the volume to create a replicated 
>>>>>> volume.
>>>>>>
>>>>>> If this new brick is empty, I guess that files will be copied from the 
>>>>>> first
>>>>>> brick to
>>>>>> get the bricks in sync, right?
>>>>>>
>>>>>> However, if the second brick is not empty but rather contains a subset 
>>>>>> of the
>>>>>> files
>>>>>> on the first brick I don't see
>>>>>> how GlusterFS will solve the problem of syncing the bricks.
>>>>>>
>>>>>> I guess that all files which lack extended attributes could be removed 
>>>>>> in this
>>>>>> scenario, because they were created
>>>>>> when the disk was not part of a GlusterFS volume. However, in case the 
>>>>>> brick was
>>>>>> used
>>>>>> in the volume previously,
>>>>>> for instance before that server restarted, there will be extended 
>>>>>> attributes for
>>>>>> the
>>>>>> files on the second brick which
>>>>>> weren't updated during the downtime (when the volume consisted of only 
>>>>>> one
>>>>>> brick).
>>>>>> There could be multiple
>>>>>> changes to the files during this time. In this case I don't understand 
>>>>>> how the
>>>>>> extended attributes could be used to
>>>>>> determine which of the bricks contains the most recent file.
>>>>>>
>>>>>> Can anyone explain how this works? Is it only allowed to add empty 
>>>>>> bricks to a
>>>>>> volume?
>>>>>>
>>>>>>    
>>>>> It is allowed to add only empty bricks to the volume. Writing directly to
>>>>> bricks is
>>>>> not supported. One needs to access the volume only from a mount point or 
>>>>> using
>>>>> libgfapi.
>>>>> After adding a brick to increase the distribute count, you need to run 
>>>>> the volume
>>>>> rebalance command so that the some of the existing files are hashed 
>>>>> (moved) to
>>>>> this
>>>>> newly added brick.
>>>>> After adding a brick to increase the replica count, you need to run the 
>>>>> volume
>>>>> heal
>>>>> full command to sync the files from the other replica into the newly 
>>>>> added brick.
>>>>> https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md 
>>>>> will give
>>>>> you an idea of how the replicate translator uses xattrs to keep files in 
>>>>> sync.
>>>>>
>>>>> HTH,
>>>>> Ravi
>>
>
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Initial sync

Reply via email to