Hi Atin, You are right!!! I was using the version 3.5 in production. And when I've checked the Gluster source code, I checked the wrong commit (not the latest commit in the master branch).
Currently, you've already implemented my the proposed solution. It was done at the function gd_peerinfo_find_from_addrinfo, file xlators/mgmt/glusterd/src/glusterd-peer-utils.c. Thanks for your tip! And sorry for any inconvenience. -- *Rarylson Freitas* On Thu, Jul 2, 2015 at 2:01 AM, Atin Mukherjee <[email protected]> wrote: > Which gluster version are you using? Better peer identification feature > (available 3.6 onwards) should tackle this problem IMO. > > ~Atin > > On 07/02/2015 10:05 AM, Rarylson Freitas wrote: > > Hi, > > > > Recently, my company needed to change our hostnames used in the Gluster > > Pool. > > > > In a first moment, we have two Gluster Nodes called storage1 and > storage2. > > Our volumes used two bricks: storage1:/MYVOLYME and storage2:/MYVOLUME. > We > > put the storage1 and storage2 IPs in the /etc/hosts file of our nodes and > > in our client servers. > > > > After some time, more client servers started to using Gluster and we > > discovered that using hostnames without domain (using /etc/hosts) in all > > client servers is a pain in the a$$ :(. So, we decided to change them to > > something like storage1.mydomain.com and storage2.mydomain.com. > > > > Remember that, at this point, we had already some volumes (with bricks): > > > > $ gluster volume info MYVOL > > [...] > > Brick1: storage1:/MYDIR > > Brick1: storage2:/MYDIR > > > > For simplicity, let's consider that we had two Gluster Nodes, each one > with > > the following entries in /etc/hosts: > > > > 10.10.10.1 storage1 > > 10.10.10.2 storage2 > > > > To implement the hostname changes, we've changed the etc hosts file to: > > > > 10.10.10.1 storage1 storage1.mydomain.com > > 10.10.10.2 storage2 storage2.mydomain.com > > > > And we've run in storage1: > > > > $ gluster peer probe storage2.mydomain.com > > peer probe: success > > > > Everything works well during some time, but the glusterd starts to fail > > after any reboot: > > > > $ service glusterfs-server status > > glusterfs-server start/running, process 14714 > > $ service glusterfs-server restart > > glusterfs-server stop/waiting > > glusterfs-server start/running, process 14860 > > $ service glusterfs-server status > > glusterfs-server stop/waiting > > > > To start the service again, it was necessary to rollback the hostname1 > > config to storage2 in /var/lib/glusterd/peers/OUR_UUID. > > > > After some try and error, we discovered that if we change the order of > the > > entries in /etc/hosts and repeat the process, everything worked. > > > > It is, from: > > > > 10.10.10.1 storage1 storage1.mydomain.com > > 10.10.10.2 storage2 storage2.mydomain.com > > > > To: > > > > 10.10.10.1 storage1.mydomain.com storage1 > > 10.10.10.2 storage2.mydomain.com storage2 > > > > And run: > > > > gluster peer probe storage2.mydomain.com > > service glusterfs-server restart > > > > So we've checked the Glusterd debug log and checked the GlusterFS source > > code and discovered that the big secret was the function > > glusterd_friend_find_by_hostname, in the file > > xlators/mgmt/glusterd/src/glusterd-utils.c. This function is called for > > each brick that isn't a local brick and does the following things: > > > > - It checks if the brick hostname is equal to some peer hostname; > > - If it's, this peer is our wanted friend; > > - If not, it gets the brick IP (resolves the hostname using the > function > > getaddrinfo) and checks if the brick IP is equal to the peer hostname; > > - It is, we could run gluster peer probe 10.10.10.2. Once the brick > > IP (storage2 resolves to 10.10.10.2) would have equal to the peer > > "hostname" (10.10.10.2); > > - If it's, this peer is our wanted friend; > > - If not, gets the reverse of the brick IP (using the function > > getnameinfo) and checks if the brick reverse is equal to the peer > > hostname; > > - This is why changing the order of the entries in /etc/hosts > worked > > as an workaround for us; > > - If not, returns and error (and Glusterd will fail). > > > > However, we think that comparing the brick IP (resolving the brick > > hostname) and the peer IP (resolving the peer hostname) would be a > simpler > > and more comprehensive solution. Once both brick and peer will have > > difference hostnames, but the same IP, it would work. > > > > The solution could be: > > > > - It checks if the brick hostname is equal to some peer hostname; > > - If it's, this peer is our wanted friend; > > - If not, it gets both the brick IP (resolves the hostname using the > > function getaddrinfo) and the peer IP (resolves the peer hostname) > and, > > for each IP pair, check if a brick IP is equal to a peer IP; > > - If it's, this peer is our wanted friend; > > - If not, returns and error (and Glusterd will fail). > > > > What do you think about it? > > -- > > > > *Rarylson Freitas* > > Computer Engineer > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > [email protected] > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > -- > ~Atin >
_______________________________________________ Gluster-devel mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-devel
