On 20 June 2017 at 20:38, Aravinda <avish...@redhat.com> wrote: > On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote: > > Xavi, Aravinda and I had a discussion on #gluster-dev and we agreed to go > with the format Aravinda suggested for now and in future we wanted some > more changes for dht to detect which subvolume went down came back up, at > that time we will revisit the solution suggested by Xavi. > > Susanth is doing the dht changes > Aravinda is doing geo-rep changes > > Done. Geo-rep patch sent for review https://review.gluster.org/17582 > > The proposed changes to the node-uuid behaviour (while good) are going to break tiering . Tiering changes will take a little more time to be coded and tested.
As this is a regression for 3.11 and a blocker for 3.11.1, I suggest we go back to the original node-uuid behaviour for now so as to unblock the release and target the proposed changes for the next 3.11 releases. Regards, Nithya > -- > Aravinda > > > > Thanks to all of you guys for the discussions! > > On Tue, Jun 20, 2017 at 5:05 PM, Xavier Hernandez <xhernan...@datalab.es> > wrote: > >> Hi Aravinda, >> >> On 20/06/17 12:42, Aravinda wrote: >> >>> I think following format can be easily adopted by all components >>> >>> UUIDs of a subvolume are seperated by space and subvolumes are separated >>> by comma >>> >>> For example, node1 and node2 are replica with U1 and U2 UUIDs >>> respectively and >>> node3 and node4 are replica with U3 and U4 UUIDs respectively >>> >>> node-uuid can return "U1 U2,U3 U4" >>> >> >> While this is ok for current implementation, I think this can be >> insufficient if there are more layers of xlators that require to indicate >> some sort of grouping. Some representation that can represent hierarchy >> would be better. For example: "(U1 U2) (U3 U4)" (we can use spaces or comma >> as a separator). >> >> >>> Geo-rep can split by "," and then split by space and take first UUID >>> DHT can split the value by space or comma and get unique UUIDs list >>> >> >> This doesn't solve the problem I described in the previous email. Some >> more logic will need to be added to avoid more than one node from each >> replica-set to be active. If we have some explicit hierarchy information in >> the node-uuid value, more decisions can be taken. >> >> An initial proposal I made was this: >> >> DHT[2](AFR[2,0](NODE(U1), NODE(U2)), AFR[2,0](NODE(U1), NODE(U2))) >> >> This is harder to parse, but gives a lot of information: DHT with 2 >> subvolumes, each subvolume is an AFR with replica 2 and no arbiters. It's >> also easily extensible with any new xlator that changes the layout. >> >> However maybe this is not the moment to do this, and probably we could >> implement this in a new xattr with a better name. >> >> Xavi >> >> >> >>> Another question is about the behavior when a node is down, existing >>> node-uuid xattr will not return that UUID if a node is down. What is the >>> behavior with the proposed xattr? >>> >>> Let me know your thoughts. >>> >>> regards >>> Aravinda VK >>> >>> On 06/20/2017 03:06 PM, Aravinda wrote: >>> >>>> Hi Xavi, >>>> >>>> On 06/20/2017 02:51 PM, Xavier Hernandez wrote: >>>> >>>>> Hi Aravinda, >>>>> >>>>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote: >>>>> >>>>>> Adding more people to get a consensus about this. >>>>>> >>>>>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avish...@redhat.com >>>>>> <mailto:avish...@redhat.com>> wrote: >>>>>> >>>>>> >>>>>> regards >>>>>> Aravinda VK >>>>>> >>>>>> >>>>>> On 06/20/2017 01:26 PM, Xavier Hernandez wrote: >>>>>> >>>>>> Hi Pranith, >>>>>> >>>>>> adding gluster-devel, Kotresh and Aravinda, >>>>>> >>>>>> On 20/06/17 09:45, Pranith Kumar Karampuri wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez >>>>>> <xhernan...@datalab.es <mailto:xhernan...@datalab.es> >>>>>> <mailto:xhernan...@datalab.es >>>>>> <mailto:xhernan...@datalab.es>>> wrote: >>>>>> >>>>>> On 20/06/17 09:31, Pranith Kumar Karampuri wrote: >>>>>> >>>>>> The way geo-replication works is: >>>>>> On each machine, it does getxattr of node-uuid and >>>>>> check if its >>>>>> own uuid >>>>>> is present in the list. If it is present then it >>>>>> will consider >>>>>> it active >>>>>> otherwise it will be considered passive. With this >>>>>> change we are >>>>>> giving >>>>>> all uuids instead of first-up subvolume. So all >>>>>> machines think >>>>>> they are >>>>>> ACTIVE which is bad apparently. So that is the >>>>>> reason. Even I >>>>>> felt bad >>>>>> that we are doing this change. >>>>>> >>>>>> >>>>>> And what about changing the content of node-uuid to >>>>>> include some >>>>>> sort of hierarchy ? >>>>>> >>>>>> for example: >>>>>> >>>>>> a single brick: >>>>>> >>>>>> NODE(<guid>) >>>>>> >>>>>> AFR/EC: >>>>>> >>>>>> AFR[2](NODE(<guid>), NODE(<guid>)) >>>>>> EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>)) >>>>>> >>>>>> DHT: >>>>>> >>>>>> DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)), >>>>>> AFR[2](NODE(<guid>), >>>>>> NODE(<guid>))) >>>>>> >>>>>> This gives a lot of information that can be used to >>>>>> take the >>>>>> appropriate decisions. >>>>>> >>>>>> >>>>>> I guess that is not backward compatible. Shall I CC >>>>>> gluster-devel and >>>>>> Kotresh/Aravinda? >>>>>> >>>>>> >>>>>> Is the change we did backward compatible ? if we only require >>>>>> the first field to be a GUID to support backward >>>>>> compatibility, >>>>>> we can use something like this: >>>>>> >>>>>> No. But the necessary change can be made to Geo-rep code as well >>>>>> if >>>>>> format is changed, Since all these are built/shipped together. >>>>>> >>>>>> Geo-rep uses node-id as follows, >>>>>> >>>>>> list = listxattr(node-uuid) >>>>>> active_node_uuids = list.split(SPACE) >>>>>> active_node_flag = True if self.node_id exists in >>>>>> active_node_uuids >>>>>> else False >>>>>> >>>>> >>>>> How was this case solved ? >>>>> >>>>> suppose we have three servers and 2 bricks in each server. A >>>>> replicated volume is created using the following command: >>>>> >>>>> gluster volume create test replica 2 server1:/brick1 server2:/brick1 >>>>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2 >>>>> >>>>> In this case we have three replica-sets: >>>>> >>>>> * server1:/brick1 server2:/brick1 >>>>> * server2:/brick2 server3:/brick1 >>>>> * server3:/brick2 server2:/brick2 >>>>> >>>>> Old AFR implementation for node-uuid always returned the uuid of the >>>>> node of the first brick, so in this case we will get the uuid of the >>>>> three nodes because all of them are the first brick of a replica-set. >>>>> >>>>> Does this mean that with this configuration all nodes are active ? Is >>>>> this a problem ? Is there any other check to avoid this situation if >>>>> it's not good ? >>>>> >>>> Yes all Geo-rep workers will become Active and participate in syncing. >>>> Since changelogs will have the same information in replica bricks this >>>> will lead to duplicate syncing and consuming network bandwidth. >>>> >>>> Node-uuid based Active worker is the default configuration in Geo-rep >>>> till now, Geo-rep also has Meta Volume based syncronization for Active >>>> worker using lock files.(Can be opted using Geo-rep configuration, >>>> with this config node-uuid will not be used) >>>> >>>> Kotresh proposed a solution to configure which worker to become >>>> Active. This will give more control to Admin to choose Active workers, >>>> This will become default configuration from 3.12 >>>> https://github.com/gluster/glusterfs/issues/244 >>>> >>>> -- >>>> Aravinda >>>> >>>> >>>>> Xavi >>>>> >>>>> >>>>>> >>>>>> >>>>>> Bricks: >>>>>> >>>>>> <guid> >>>>>> >>>>>> AFR/EC: >>>>>> <guid>(<guid>, <guid>) >>>>>> >>>>>> DHT: >>>>>> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...)) >>>>>> >>>>>> In this case, AFR and EC would return the same <guid> they >>>>>> returned before the patch, but between '(' and ')' they put >>>>>> the >>>>>> full list of guid's of all nodes. The first <guid> can be used >>>>>> by geo-replication. The list after the first <guid> can be >>>>>> used >>>>>> for rebalance. >>>>>> >>>>>> Not sure if there's any user of node-uuid above DHT. >>>>>> >>>>>> Xavi >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Xavi >>>>>> >>>>>> >>>>>> On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez >>>>>> <xhernan...@datalab.es >>>>>> <mailto:xhernan...@datalab.es> >>>>>> <mailto:xhernan...@datalab.es >>>>>> <mailto:xhernan...@datalab.es>> >>>>>> <mailto:xhernan...@datalab.es >>>>>> <mailto:xhernan...@datalab.es> >>>>>> <mailto:xhernan...@datalab.es >>>>>> <mailto:xhernan...@datalab.es>>>> >>>>>> wrote: >>>>>> >>>>>> Hi Pranith, >>>>>> >>>>>> On 20/06/17 07:53, Pranith Kumar Karampuri >>>>>> wrote: >>>>>> >>>>>> hi Xavi, >>>>>> We all made the mistake of not >>>>>> sending about changing >>>>>> behavior of >>>>>> node-uuid xattr so that rebalance can use >>>>>> multiple nodes >>>>>> for doing >>>>>> rebalance. Because of this on geo-rep all >>>>>> the workers >>>>>> are becoming >>>>>> active instead of one per EC/AFR >>>>>> subvolume. >>>>>> So we are >>>>>> frantically trying >>>>>> to restore the functionality of node-uuid >>>>>> and introduce >>>>>> a new >>>>>> xattr for >>>>>> the new behavior. Sunil will be sending >>>>>> out >>>>>> a patch for >>>>>> this. >>>>>> >>>>>> >>>>>> Wouldn't it be better to change geo-rep >>>>>> behavior >>>>>> to use the >>>>>> new data >>>>>> ? I think it's better as it's now, since it >>>>>> gives more >>>>>> information >>>>>> to upper layers so that they can take more >>>>>> accurate decisions. >>>>>> >>>>>> Xavi >>>>>> >>>>>> >>>>>> -- >>>>>> Pranith >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pranith >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pranith >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pranith >>>>>> >>>>> >>>>> >>>> >>> >> > > > -- > Pranith > > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel