Late update to this thread, but just so you don't go down the wrong road on 
this - it was almost definitely not an IPtables problem - they are never turned 
on here, and even if they were, absolutely no custom rules would have been 
running.

James Burnash, Unix Engineering

-----Original Message-----
From: Pranith Kumar. Karampuri [mailto:[email protected]] 
Sent: Monday, March 21, 2011 10:37 PM
To: Mohit Anchlia
Cc: Burnash, James; [email protected]
Subject: Re: [Gluster-users] What does this error mean?

hi,
    Whenever a peer goes down all the other machines in the cluster keep on 
trying to re-connect to it. And when the peer comes backup again the 
re-connectiion will succeed.  The only times we have seen problems are change 
in ip-address and issue with ip-tables. We will have to investigate as to what 
might have happened. Considering the restart fixed the problem, it is not the 
change in ip-address. We shall try reproducing it with ip-tables issue.

Pranith.

----- Original Message -----
From: "Mohit Anchlia" <[email protected]>
To: "James Burnash" <[email protected]>, [email protected]
Sent: Tuesday, March 22, 2011 12:54:52 AM
Subject: Re: [Gluster-users] What does this error mean?

I also think there might be a bug where gluster continues to use bad
socket instead of trying to re-establish connection. Not sure why that
is and how that works when one machine fails and comes backup. Can
someone from gluster developer team look at this and provide some
insight?

On Mon, Mar 21, 2011 at 12:20 PM, Mohit Anchlia <[email protected]> wrote:
> Is node 5 still showing "Disconnected" for node 6?
>
> On Mon, Mar 21, 2011 at 12:08 PM, Burnash, James <[email protected]> wrote:
>> After 'service glusterd restart' on node 6:
>>
>> root@jc1letgfs5:/etc/glusterd/vols# gluster peer status
>> Number of Peers: 3
>>
>> Hostname: jc1letgfs6
>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>> State: Peer in Cluster (Disconnected)
>>
>> Hostname: jc1letgfs7
>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>> State: Peer in Cluster (Connected)
>>
>> Hostname: jc1letgfs8
>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
>> State: Peer in Cluster (Connected)
>>
>> BUT ... after 'service glusterd restart' on node 5:
>>
>> root@jc1letgfs5:/etc/glusterd/vols# gluster peer status
>> Number of Peers: 3
>>
>> Hostname: jc1letgfs7
>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>> State: Peer in Cluster (Connected)
>>
>> Hostname: jc1letgfs8
>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
>> State: Peer in Cluster (Connected)
>>
>> Hostname: jc1letgfs6
>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>> State: Peer in Cluster (Connected)
>>
>> Works now. Thanks so much. I suspect a race condition of some sort, though 
>> what I'll leave up to the devs.
>>
>> -----Original Message-----
>> From: Mohit Anchlia [mailto:[email protected]]
>> Sent: Monday, March 21, 2011 2:57 PM
>> To: Burnash, James; [email protected]
>> Subject: Re: [Gluster-users] What does this error mean?
>>
>> At this point can you do /etc/init.d/gluster stop and then start and
>> see if this changes anything? Or do you see same behaviour? I am
>> thinking gluster might have tried to start too soon on reboot.
>>
>> On Mon, Mar 21, 2011 at 11:43 AM, Burnash, James <[email protected]> wrote:
>>> Short answers - yes all on the same subnet.
>>> Every host can ping the others
>>> Iptables shows empty entries for all filters
>>>
>>> Details are here - http://pastebin.com/eKtRMbGE
>>>
>>> I did explicitly turn the iptables off again, and then checked again:
>>>
>>> jc1letgfs5
>>> Firewall is stopped.
>>>
>>> jc1letgfs6
>>> Firewall is stopped.
>>>
>>> jc1letgfs7
>>> Firewall is stopped.
>>>
>>> jc1letgfs8
>>> Firewall is stopped.
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> -----Original Message-----
>>> From: Mohit Anchlia [mailto:[email protected]]
>>> Sent: Monday, March 21, 2011 2:25 PM
>>> To: Burnash, James
>>> Cc: [email protected]
>>> Subject: Re: [Gluster-users] What does this error mean?
>>>
>>> Are they in same subnet? What happens if you ping these hosts
>>> individually? Do they ping?
>>>
>>> I closely looked at the error you posted and "connection to
>>> 10.20.72.157:24007 failed (No route to host" points to either firewall
>>> issue or could be a switch issue on the network. Ping test on each
>>> host to each other will be helpful.
>>>
>>> Can you post results of ping and also "service iptables status" from each 
>>> node?
>>>
>>> On Mon, Mar 21, 2011 at 11:16 AM, Burnash, James <[email protected]> 
>>> wrote:
>>>> A little more information:
>>>>
>>>> From the original (first peer node):
>>>> root@jc1letgfs5:/etc/glusterd/vols# gluster peer status
>>>> Number of Peers: 3
>>>>
>>>> Hostname: jc1letgfs6
>>>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>>>> State: Peer in Cluster (Disconnected)
>>>>
>>>> Hostname: jc1letgfs7
>>>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>>>> State: Peer in Cluster (Connected)
>>>>
>>>> Hostname: jc1letgfs8
>>>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
>>>> State: Peer in Cluster (Connected)
>>>>
>>>>
>>>> From the problem node:
>>>> *** NOTE - only one Peer seen
>>>> root@jc1letgfs6:~# gluster peer status
>>>> Number of Peers: 1
>>>>
>>>> Hostname: 10.20.72.156
>>>> Uuid: 95e1d79a-632a-4774-9d7e-a7234cb084ca
>>>> State: Peer in Cluster (Connected)
>>>>
>>>>
>>>> From a different peer node:
>>>> root@jc1letgfs8:~# gluster peer status
>>>> Number of Peers: 3
>>>>
>>>> Hostname: jc1letgfs6
>>>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>>>> State: Peer Rejected (Connected)
>>>>
>>>> Hostname: jc1letgfs7
>>>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>>>> State: Peer in Cluster (Connected)
>>>>
>>>> Hostname: 10.20.72.156
>>>> Uuid: 95e1d79a-632a-4774-9d7e-a7234cb084ca
>>>> State: Peer in Cluster (Connected)
>>>>
>>>> -----Original Message-----
>>>> From: [email protected] 
>>>> [mailto:[email protected]] On Behalf Of Burnash, James
>>>> Sent: Monday, March 21, 2011 2:05 PM
>>>> To: Mohit Anchlia
>>>> Cc: [email protected]
>>>> Subject: Re: [Gluster-users] What does this error mean?
>>>>
>>>> I did do this, and noting in particular stands out.
>>>>
>>>> I'll exercise it some more, and see if we can get something that will at 
>>>> least point in the proper direction.
>>>>
>>>> I suspect that another reboot of the affected machine will fix this 
>>>> condition - but it won't help me understand the root problem the next time 
>>>> this happens.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>>
>>>> -----Original Message-----
>>>> From: Mohit Anchlia [mailto:[email protected]]
>>>> Sent: Monday, March 21, 2011 12:40 PM
>>>> To: Burnash, James
>>>> Cc: [email protected]
>>>> Subject: Re: [Gluster-users] What does this error mean?
>>>>
>>>> Can you turn on DEBUG and see if there is something that stands out?
>>>>
>>>> On Mon, Mar 21, 2011 at 9:34 AM, Burnash, James <[email protected]> 
>>>> wrote:
>>>>> Does anybody have any clue as to why this is happening? The problem has 
>>>>> persisted for several days now, but I can't find anything at all in the 
>>>>> logs to possibly explain why this is so.
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] 
>>>>> [mailto:[email protected]] On Behalf Of Burnash, James
>>>>> Sent: Wednesday, March 16, 2011 9:10 AM
>>>>> To: [email protected]
>>>>> Subject: [SPAM?] [Gluster-users] What does this error mean?
>>>>> Importance: Low
>>>>>
>>>>> Hello.
>>>>>
>>>>> After purposely crashing (via ' echo b>/proc/sysrq-trigger ) node 
>>>>> jc1letgfs6 to test mirroring, even after the node has rebooted and is 
>>>>> back online I am still seeing the statement "Disconnected" for that node 
>>>>> when I execute the following command on the first storage node:
>>>>>
>>>>> root@jc1letgfs5:/etc/glusterd/vols# gluster peer status
>>>>> Number of Peers: 3
>>>>>
>>>>> Hostname: jc1letgfs6
>>>>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>>>>> State: Peer in Cluster (Disconnected)
>>>>>
>>>>> Hostname: jc1letgfs7
>>>>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>>>>> State: Peer in Cluster (Disconnected)
>>>>>
>>>>> Hostname: jc1letgfs8
>>>>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>> This is running on 4 servers with CentOS 5.5 (x86_64), GlusterFS 3.1.1
>>>>>
>>>>> Here is the volume info:
>>>>>
>>>>> # gluster volume info
>>>>>
>>>>> Volume Name: test-pfs-ro1
>>>>> Type: Distributed-Replicate
>>>>> Status: Started
>>>>> Number of Bricks: 4 x 2 = 8
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: jc1letgfs5:/export/read-only/g01
>>>>> Brick2: jc1letgfs6:/export/read-only/g01
>>>>> Brick3: jc1letgfs5:/export/read-only/g02
>>>>> Brick4: jc1letgfs6:/export/read-only/g02
>>>>> Brick5: jc1letgfs7:/export/read-only/g01
>>>>> Brick6: jc1letgfs8:/export/read-only/g01
>>>>> Brick7: jc1letgfs7:/export/read-only/g02
>>>>> Brick8: jc1letgfs8:/export/read-only/g02
>>>>> Options Reconfigured:
>>>>> performance.stat-prefetch: on
>>>>> performance.cache-size: 2GB
>>>>> network.ping-timeout: 10
>>>>>
>>>>> Even with this error, mirroring functions as expected, and the node is 
>>>>> recognized and utilized, as can be seen in this log fragment from 
>>>>> jc1letgfs5: /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>>>>>
>>>>> [2011-03-13 23:51:31.458329] E [socket.c:1656:socket_connect_finish] 
>>>>> management: connection to 10.20.72.157:24007 failed (No route to ho
>>>>> st)
>>>>> [2011-03-13 23:53:49.42170] I 
>>>>> [glusterd3_1-mops.c:172:glusterd3_1_friend_add_cbk] glusterd: Received 
>>>>> ACC from uuid: cd590fad-022c-4b9a-9
>>>>> 7f5-3262080d772d, host: jc1letgfs6, port: 0
>>>>> [2011-03-13 23:53:49.42204] I 
>>>>> [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend 
>>>>> found.. state: Peer in Cluster
>>>>> [2011-03-13 23:53:49.42320] I 
>>>>> [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend 
>>>>> found.. state: Peer in Cluster
>>>>> [2011-03-13 23:53:49.42336] I 
>>>>> [glusterd-handler.c:2267:glusterd_handle_friend_update] glusterd: 
>>>>> Received friend update from uuid: cd590f
>>>>> ad-022c-4b9a-97f5-3262080d772d
>>>>> [2011-03-13 23:53:49.42359] I 
>>>>> [glusterd-handler.c:2312:glusterd_handle_friend_update] : Received uuid: 
>>>>> 95e1d79a-632a-4774-9d7e-a7234cb08
>>>>> 4ca, hostname:10.20.72.156
>>>>> [2011-03-13 23:53:49.42412] I 
>>>>> [glusterd-handler.c:2315:glusterd_handle_friend_update] : Received my 
>>>>> uuid as Friend
>>>>>
>>>>>
>>>>> Any pointers or help would be appreciated.
>>>>>
>>>>> James Burnash, Unix Engineering
>>>>>
>>>>>
>>>>> DISCLAIMER:
>>>>> This e-mail, and any attachments thereto, is intended only for use by the 
>>>>> addressee(s) named herein and may contain legally privileged and/or 
>>>>> confidential information. If you are not the intended recipient of this 
>>>>> e-mail, you are hereby notified that any dissemination, distribution or 
>>>>> copying of this e-mail, and any attachments thereto, is strictly 
>>>>> prohibited. If you have received this in error, please immediately notify 
>>>>> me and permanently delete the original and any copy of any e-mail and any 
>>>>> printout thereof. E-mail transmission cannot be guaranteed to be secure 
>>>>> or error-free. The sender therefore does not accept liability for any 
>>>>> errors or omissions in the contents of this message which arise as a 
>>>>> result of e-mail transmission.
>>>>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at 
>>>>> its discretion, monitor and review the content of all e-mail 
>>>>> communications. http://www.knight.com
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> [email protected]
>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> [email protected]
>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> [email protected]
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>>>
>>
>
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to