On Wed, Jun 21, 2017 at 1:56 PM, Xavier Hernandez <xhernan...@datalab.es> wrote:
> That's ok. I'm currently unable to write a patch for this on ec. Sunil is working on this patch. ~Karthik > If no one can do it, I can try to do it in 6 - 7 hours... > > Xavi > > > On Wednesday, June 21, 2017 09:48 CEST, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > > > > > On Wed, Jun 21, 2017 at 1:00 PM, Xavier Hernandez <xhernan...@datalab.es> > wrote: >> >> I'm ok with reverting node-uuid content to the previous format and create >> a new xattr for the new format. Currently, only rebalance will use it. >> >> Only thing to consider is what can happen if we have a half upgraded >> cluster where some clients have this change and some not. Can rebalance >> work in this situation ? if so, could there be any issue ? > > > I think there shouldn't be any problem, because this is in-memory xattr so > layers below afr/ec will only see node-uuid xattr. > This also gives us a chance to do whatever we want to do in future with > this xattr without any problems about backward compatibility. > > You can check https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/ > src/afr-inode-read.c@1507 for how karthik implemented this in AFR (this > got merged accidentally yesterday, but looks like this is what we are > settling on) > > >> >> Xavi >> >> >> On Wednesday, June 21, 2017 06:56 CEST, Pranith Kumar Karampuri < >> pkara...@redhat.com> wrote: >> >> >> >> >> On Wed, Jun 21, 2017 at 10:07 AM, Nithya Balachandran < >> nbala...@redhat.com> wrote: >>> >>> >>> On 20 June 2017 at 20:38, Aravinda <avish...@redhat.com> wrote: >>>> >>>> On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote: >>>> >>>> Xavi, Aravinda and I had a discussion on #gluster-dev and we agreed to >>>> go with the format Aravinda suggested for now and in future we wanted some >>>> more changes for dht to detect which subvolume went down came back up, at >>>> that time we will revisit the solution suggested by Xavi. >>>> >>>> Susanth is doing the dht changes >>>> Aravinda is doing geo-rep changes >>>> >>>> Done. Geo-rep patch sent for review https://review.gluster.org/17582 >>>> >>>> >>> >>> The proposed changes to the node-uuid behaviour (while good) are going >>> to break tiering . Tiering changes will take a little more time to be coded >>> and tested. >>> >>> As this is a regression for 3.11 and a blocker for 3.11.1, I suggest we >>> go back to the original node-uuid behaviour for now so as to unblock the >>> release and target the proposed changes for the next 3.11 releases. >>> >> >> Let me see if I understand the changes correctly. We are restoring the >> behavior of node-uuid xattr and adding a new xattr for parallel rebalance >> for both afr and ec, correct? Otherwise that is one more regression. If >> yes, we will also wait for Xavi's inputs. Jeff accidentally merged the afr >> patch yesterday which does these changes. If everyone is in agreement, we >> will leave it as is and add similar changes in ec as well. If we are not in >> agreement, then we will let the discussion progress :-) >> >> >>> >>> >>> Regards, >>> Nithya >>> >>>> -- >>>> Aravinda >>>> >>>> >>>> >>>> Thanks to all of you guys for the discussions! >>>> >>>> On Tue, Jun 20, 2017 at 5:05 PM, Xavier Hernandez < >>>> xhernan...@datalab.es> wrote: >>>>> >>>>> Hi Aravinda, >>>>> >>>>> On 20/06/17 12:42, Aravinda wrote: >>>>>> >>>>>> I think following format can be easily adopted by all components >>>>>> >>>>>> UUIDs of a subvolume are seperated by space and subvolumes are >>>>>> separated >>>>>> by comma >>>>>> >>>>>> For example, node1 and node2 are replica with U1 and U2 UUIDs >>>>>> respectively and >>>>>> node3 and node4 are replica with U3 and U4 UUIDs respectively >>>>>> >>>>>> node-uuid can return "U1 U2,U3 U4" >>>>> >>>>> >>>>> While this is ok for current implementation, I think this can be >>>>> insufficient if there are more layers of xlators that require to indicate >>>>> some sort of grouping. Some representation that can represent hierarchy >>>>> would be better. For example: "(U1 U2) (U3 U4)" (we can use spaces or >>>>> comma >>>>> as a separator). >>>>> >>>>>> >>>>>> >>>>>> Geo-rep can split by "," and then split by space and take first UUID >>>>>> DHT can split the value by space or comma and get unique UUIDs list >>>>> >>>>> >>>>> This doesn't solve the problem I described in the previous email. Some >>>>> more logic will need to be added to avoid more than one node from each >>>>> replica-set to be active. If we have some explicit hierarchy information >>>>> in >>>>> the node-uuid value, more decisions can be taken. >>>>> >>>>> An initial proposal I made was this: >>>>> >>>>> DHT[2](AFR[2,0](NODE(U1), NODE(U2)), AFR[2,0](NODE(U1), NODE(U2))) >>>>> >>>>> This is harder to parse, but gives a lot of information: DHT with 2 >>>>> subvolumes, each subvolume is an AFR with replica 2 and no arbiters. It's >>>>> also easily extensible with any new xlator that changes the layout. >>>>> >>>>> However maybe this is not the moment to do this, and probably we could >>>>> implement this in a new xattr with a better name. >>>>> >>>>> Xavi >>>>> >>>>>> >>>>>> >>>>>> Another question is about the behavior when a node is down, existing >>>>>> node-uuid xattr will not return that UUID if a node is down. What is >>>>>> the >>>>>> behavior with the proposed xattr? >>>>>> >>>>>> Let me know your thoughts. >>>>>> >>>>>> regards >>>>>> Aravinda VK >>>>>> >>>>>> On 06/20/2017 03:06 PM, Aravinda wrote: >>>>>>> >>>>>>> Hi Xavi, >>>>>>> >>>>>>> On 06/20/2017 02:51 PM, Xavier Hernandez wrote: >>>>>>>> >>>>>>>> Hi Aravinda, >>>>>>>> >>>>>>>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote: >>>>>>>>> >>>>>>>>> Adding more people to get a consensus about this. >>>>>>>>> >>>>>>>>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avish...@redhat.com >>>>>>>>> <mailto:avish...@redhat.com>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> regards >>>>>>>>> Aravinda VK >>>>>>>>> >>>>>>>>> >>>>>>>>> On 06/20/2017 01:26 PM, Xavier Hernandez wrote: >>>>>>>>> >>>>>>>>> Hi Pranith, >>>>>>>>> >>>>>>>>> adding gluster-devel, Kotresh and Aravinda, >>>>>>>>> >>>>>>>>> On 20/06/17 09:45, Pranith Kumar Karampuri wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez >>>>>>>>> <xhernan...@datalab.es <mailto:xhernan...@datalab.es> >>>>>>>>> <mailto:xhernan...@datalab.es >>>>>>>>> <mailto:xhernan...@datalab.es>>> wrote: >>>>>>>>> >>>>>>>>> On 20/06/17 09:31, Pranith Kumar Karampuri wrote: >>>>>>>>> >>>>>>>>> The way geo-replication works is: >>>>>>>>> On each machine, it does getxattr of node-uuid >>>>>>>>> and >>>>>>>>> check if its >>>>>>>>> own uuid >>>>>>>>> is present in the list. If it is present then >>>>>>>>> it >>>>>>>>> will consider >>>>>>>>> it active >>>>>>>>> otherwise it will be considered passive. With >>>>>>>>> this >>>>>>>>> change we are >>>>>>>>> giving >>>>>>>>> all uuids instead of first-up subvolume. So all >>>>>>>>> machines think >>>>>>>>> they are >>>>>>>>> ACTIVE which is bad apparently. So that is the >>>>>>>>> reason. Even I >>>>>>>>> felt bad >>>>>>>>> that we are doing this change. >>>>>>>>> >>>>>>>>> >>>>>>>>> And what about changing the content of node-uuid to >>>>>>>>> include some >>>>>>>>> sort of hierarchy ? >>>>>>>>> >>>>>>>>> for example: >>>>>>>>> >>>>>>>>> a single brick: >>>>>>>>> >>>>>>>>> NODE(<guid>) >>>>>>>>> >>>>>>>>> AFR/EC: >>>>>>>>> >>>>>>>>> AFR[2](NODE(<guid>), NODE(<guid>)) >>>>>>>>> EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>)) >>>>>>>>> >>>>>>>>> DHT: >>>>>>>>> >>>>>>>>> DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)), >>>>>>>>> AFR[2](NODE(<guid>), >>>>>>>>> NODE(<guid>))) >>>>>>>>> >>>>>>>>> This gives a lot of information that can be used to >>>>>>>>> take the >>>>>>>>> appropriate decisions. >>>>>>>>> >>>>>>>>> >>>>>>>>> I guess that is not backward compatible. Shall I CC >>>>>>>>> gluster-devel and >>>>>>>>> Kotresh/Aravinda? >>>>>>>>> >>>>>>>>> >>>>>>>>> Is the change we did backward compatible ? if we only >>>>>>>>> require >>>>>>>>> the first field to be a GUID to support backward >>>>>>>>> compatibility, >>>>>>>>> we can use something like this: >>>>>>>>> >>>>>>>>> No. But the necessary change can be made to Geo-rep code as >>>>>>>>> well if >>>>>>>>> format is changed, Since all these are built/shipped together. >>>>>>>>> >>>>>>>>> Geo-rep uses node-id as follows, >>>>>>>>> >>>>>>>>> list = listxattr(node-uuid) >>>>>>>>> active_node_uuids = list.split(SPACE) >>>>>>>>> active_node_flag = True if self.node_id exists in >>>>>>>>> active_node_uuids >>>>>>>>> else False >>>>>>>> >>>>>>>> >>>>>>>> How was this case solved ? >>>>>>>> >>>>>>>> suppose we have three servers and 2 bricks in each server. A >>>>>>>> replicated volume is created using the following command: >>>>>>>> >>>>>>>> gluster volume create test replica 2 server1:/brick1 server2:/brick1 >>>>>>>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2 >>>>>>>> >>>>>>>> In this case we have three replica-sets: >>>>>>>> >>>>>>>> * server1:/brick1 server2:/brick1 >>>>>>>> * server2:/brick2 server3:/brick1 >>>>>>>> * server3:/brick2 server2:/brick2 >>>>>>>> >>>>>>>> Old AFR implementation for node-uuid always returned the uuid of the >>>>>>>> node of the first brick, so in this case we will get the uuid of the >>>>>>>> three nodes because all of them are the first brick of a >>>>>>>> replica-set. >>>>>>>> >>>>>>>> Does this mean that with this configuration all nodes are active ? >>>>>>>> Is >>>>>>>> this a problem ? Is there any other check to avoid this situation if >>>>>>>> it's not good ? >>>>>>> >>>>>>> Yes all Geo-rep workers will become Active and participate in >>>>>>> syncing. >>>>>>> Since changelogs will have the same information in replica bricks >>>>>>> this >>>>>>> will lead to duplicate syncing and consuming network bandwidth. >>>>>>> >>>>>>> Node-uuid based Active worker is the default configuration in Geo-rep >>>>>>> till now, Geo-rep also has Meta Volume based syncronization for >>>>>>> Active >>>>>>> worker using lock files.(Can be opted using Geo-rep configuration, >>>>>>> with this config node-uuid will not be used) >>>>>>> >>>>>>> Kotresh proposed a solution to configure which worker to become >>>>>>> Active. This will give more control to Admin to choose Active >>>>>>> workers, >>>>>>> This will become default configuration from 3.12 >>>>>>> https://github.com/gluster/glusterfs/issues/244 >>>>>>> >>>>>>> -- >>>>>>> Aravinda >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Xavi >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Bricks: >>>>>>>>> >>>>>>>>> <guid> >>>>>>>>> >>>>>>>>> AFR/EC: >>>>>>>>> <guid>(<guid>, <guid>) >>>>>>>>> >>>>>>>>> DHT: >>>>>>>>> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...)) >>>>>>>>> >>>>>>>>> In this case, AFR and EC would return the same <guid> they >>>>>>>>> returned before the patch, but between '(' and ')' they >>>>>>>>> put the >>>>>>>>> full list of guid's of all nodes. The first <guid> can be >>>>>>>>> used >>>>>>>>> by geo-replication. The list after the first <guid> can be >>>>>>>>> used >>>>>>>>> for rebalance. >>>>>>>>> >>>>>>>>> Not sure if there's any user of node-uuid above DHT. >>>>>>>>> >>>>>>>>> Xavi >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Xavi >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 20, 2017 at 12:46 PM, Xavier >>>>>>>>> Hernandez >>>>>>>>> <xhernan...@datalab.es >>>>>>>>> <mailto:xhernan...@datalab.es> >>>>>>>>> <mailto:xhernan...@datalab.es >>>>>>>>> <mailto:xhernan...@datalab.es>> >>>>>>>>> <mailto:xhernan...@datalab.es >>>>>>>>> <mailto:xhernan...@datalab.es> >>>>>>>>> <mailto:xhernan...@datalab.es >>>>>>>>> <mailto:xhernan...@datalab.es>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Pranith, >>>>>>>>> >>>>>>>>> On 20/06/17 07:53, Pranith Kumar Karampuri >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> hi Xavi, >>>>>>>>> We all made the mistake of not >>>>>>>>> sending about changing >>>>>>>>> behavior of >>>>>>>>> node-uuid xattr so that rebalance can >>>>>>>>> use >>>>>>>>> multiple nodes >>>>>>>>> for doing >>>>>>>>> rebalance. Because of this on geo-rep >>>>>>>>> all >>>>>>>>> the workers >>>>>>>>> are becoming >>>>>>>>> active instead of one per EC/AFR >>>>>>>>> subvolume. >>>>>>>>> So we are >>>>>>>>> frantically trying >>>>>>>>> to restore the functionality of >>>>>>>>> node-uuid >>>>>>>>> and introduce >>>>>>>>> a new >>>>>>>>> xattr for >>>>>>>>> the new behavior. Sunil will be >>>>>>>>> sending out >>>>>>>>> a patch for >>>>>>>>> this. >>>>>>>>> >>>>>>>>> >>>>>>>>> Wouldn't it be better to change geo-rep >>>>>>>>> behavior >>>>>>>>> to use the >>>>>>>>> new data >>>>>>>>> ? I think it's better as it's now, since it >>>>>>>>> gives more >>>>>>>>> information >>>>>>>>> to upper layers so that they can take more >>>>>>>>> accurate decisions. >>>>>>>>> >>>>>>>>> Xavi >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pranith >>>>>>>> >>>>>>>> >>>> >>>> >>>> -- >>>> Pranith >>>> >>>> >> >> >> -- >> Pranith >> >> >> >> > > > > -- > Pranith > > > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel