On Tue, Jun 20, 2017 at 4:12 PM, Aravinda <avish...@redhat.com> wrote:
> I think following format can be easily adopted by all components > > UUIDs of a subvolume are seperated by space and subvolumes are separated > by comma > > For example, node1 and node2 are replica with U1 and U2 UUIDs respectively > and > node3 and node4 are replica with U3 and U4 UUIDs respectively > > node-uuid can return "U1 U2,U3 U4" > > Geo-rep can split by "," and then split by space and take first UUID > DHT can split the value by space or comma and get unique UUIDs list > > Another question is about the behavior when a node is down, existing > node-uuid xattr will not return that UUID if a node is down. After the change [1], if a node is down we send all zeros as the uuid for that node, in the list of node uuids. [1] https://review.gluster.org/#/c/17084/ Regards, Karthik > What is the behavior with the proposed xattr? > > Let me know your thoughts. > > regards > Aravinda VK > > > On 06/20/2017 03:06 PM, Aravinda wrote: > >> Hi Xavi, >> >> On 06/20/2017 02:51 PM, Xavier Hernandez wrote: >> >>> Hi Aravinda, >>> >>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote: >>> >>>> Adding more people to get a consensus about this. >>>> >>>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avish...@redhat.com >>>> <mailto:avish...@redhat.com>> wrote: >>>> >>>> >>>> regards >>>> Aravinda VK >>>> >>>> >>>> On 06/20/2017 01:26 PM, Xavier Hernandez wrote: >>>> >>>> Hi Pranith, >>>> >>>> adding gluster-devel, Kotresh and Aravinda, >>>> >>>> On 20/06/17 09:45, Pranith Kumar Karampuri wrote: >>>> >>>> >>>> >>>> On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez >>>> <xhernan...@datalab.es <mailto:xhernan...@datalab.es> >>>> <mailto:xhernan...@datalab.es >>>> <mailto:xhernan...@datalab.es>>> wrote: >>>> >>>> On 20/06/17 09:31, Pranith Kumar Karampuri wrote: >>>> >>>> The way geo-replication works is: >>>> On each machine, it does getxattr of node-uuid and >>>> check if its >>>> own uuid >>>> is present in the list. If it is present then it >>>> will consider >>>> it active >>>> otherwise it will be considered passive. With this >>>> change we are >>>> giving >>>> all uuids instead of first-up subvolume. So all >>>> machines think >>>> they are >>>> ACTIVE which is bad apparently. So that is the >>>> reason. Even I >>>> felt bad >>>> that we are doing this change. >>>> >>>> >>>> And what about changing the content of node-uuid to >>>> include some >>>> sort of hierarchy ? >>>> >>>> for example: >>>> >>>> a single brick: >>>> >>>> NODE(<guid>) >>>> >>>> AFR/EC: >>>> >>>> AFR[2](NODE(<guid>), NODE(<guid>)) >>>> EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>)) >>>> >>>> DHT: >>>> >>>> DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)), >>>> AFR[2](NODE(<guid>), >>>> NODE(<guid>))) >>>> >>>> This gives a lot of information that can be used to >>>> take the >>>> appropriate decisions. >>>> >>>> >>>> I guess that is not backward compatible. Shall I CC >>>> gluster-devel and >>>> Kotresh/Aravinda? >>>> >>>> >>>> Is the change we did backward compatible ? if we only require >>>> the first field to be a GUID to support backward compatibility, >>>> we can use something like this: >>>> >>>> No. But the necessary change can be made to Geo-rep code as well if >>>> format is changed, Since all these are built/shipped together. >>>> >>>> Geo-rep uses node-id as follows, >>>> >>>> list = listxattr(node-uuid) >>>> active_node_uuids = list.split(SPACE) >>>> active_node_flag = True if self.node_id exists in active_node_uuids >>>> else False >>>> >>> >>> How was this case solved ? >>> >>> suppose we have three servers and 2 bricks in each server. A replicated >>> volume is created using the following command: >>> >>> gluster volume create test replica 2 server1:/brick1 server2:/brick1 >>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2 >>> >>> In this case we have three replica-sets: >>> >>> * server1:/brick1 server2:/brick1 >>> * server2:/brick2 server3:/brick1 >>> * server3:/brick2 server2:/brick2 >>> >>> Old AFR implementation for node-uuid always returned the uuid of the >>> node of the first brick, so in this case we will get the uuid of the three >>> nodes because all of them are the first brick of a replica-set. >>> >>> Does this mean that with this configuration all nodes are active ? Is >>> this a problem ? Is there any other check to avoid this situation if it's >>> not good ? >>> >> Yes all Geo-rep workers will become Active and participate in syncing. >> Since changelogs will have the same information in replica bricks this will >> lead to duplicate syncing and consuming network bandwidth. >> >> Node-uuid based Active worker is the default configuration in Geo-rep >> till now, Geo-rep also has Meta Volume based syncronization for Active >> worker using lock files.(Can be opted using Geo-rep configuration, with >> this config node-uuid will not be used) >> >> Kotresh proposed a solution to configure which worker to become Active. >> This will give more control to Admin to choose Active workers, This will >> become default configuration from 3.12 >> https://github.com/gluster/glusterfs/issues/244 >> >> -- >> Aravinda >> >> >>> Xavi >>> >>> >>>> >>>> >>>> Bricks: >>>> >>>> <guid> >>>> >>>> AFR/EC: >>>> <guid>(<guid>, <guid>) >>>> >>>> DHT: >>>> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...)) >>>> >>>> In this case, AFR and EC would return the same <guid> they >>>> returned before the patch, but between '(' and ')' they put the >>>> full list of guid's of all nodes. The first <guid> can be used >>>> by geo-replication. The list after the first <guid> can be used >>>> for rebalance. >>>> >>>> Not sure if there's any user of node-uuid above DHT. >>>> >>>> Xavi >>>> >>>> >>>> >>>> >>>> Xavi >>>> >>>> >>>> On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez >>>> <xhernan...@datalab.es >>>> <mailto:xhernan...@datalab.es> <mailto: >>>> xhernan...@datalab.es >>>> <mailto:xhernan...@datalab.es>> >>>> <mailto:xhernan...@datalab.es >>>> <mailto:xhernan...@datalab.es> <mailto: >>>> xhernan...@datalab.es >>>> <mailto:xhernan...@datalab.es>>>> >>>> wrote: >>>> >>>> Hi Pranith, >>>> >>>> On 20/06/17 07:53, Pranith Kumar Karampuri >>>> wrote: >>>> >>>> hi Xavi, >>>> We all made the mistake of not >>>> sending about changing >>>> behavior of >>>> node-uuid xattr so that rebalance can use >>>> multiple nodes >>>> for doing >>>> rebalance. Because of this on geo-rep all >>>> the workers >>>> are becoming >>>> active instead of one per EC/AFR subvolume. >>>> So we are >>>> frantically trying >>>> to restore the functionality of node-uuid >>>> and introduce >>>> a new >>>> xattr for >>>> the new behavior. Sunil will be sending out >>>> a patch for >>>> this. >>>> >>>> >>>> Wouldn't it be better to change geo-rep behavior >>>> to use the >>>> new data >>>> ? I think it's better as it's now, since it >>>> gives more >>>> information >>>> to upper layers so that they can take more >>>> accurate decisions. >>>> >>>> Xavi >>>> >>>> >>>> -- >>>> Pranith >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Pranith >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Pranith >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Pranith >>>> >>> >>> >> >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel