Thanks let me try the same 

On Wednesday, March 8, 2017 at 5:08:44 AM UTC+5:30, seabos wrote:
>
> So just a thought here...When you're dealing with geographically 
> distributed nodes, at least in the case of RMQ you can run into TCP 
> timeouts/truncation that can occur on long running TCP connections where 
> there is no communication between them. This is a kernel defined value. See 
> http://unix.stackexchange.com/questions/316020/in-linux-does-proc-sys-net-ipv4-tcp-keepalive-time-has-impact-on-both-client
>
> What can happen is if you have a keepalive time longer than the value of 
> the firewall truncation (idle timeout) you will see all sorts of randomness 
> in response as services disconnect and reconnect in accordance with when 
> the firewall truncates them and when the default tcp keep alive 
> "re-vivifies" the connection. 
>
> We have very large geographically distributed collectives and so this is 
> one of the things we've had to address. (also we have very large 
> collectives of over 70K, and have tested up to 250K nodes in single DCs)
>
> Here's a suggestion to try and tshoot: set the tcp keepalive time to less 
> than 1000 seconds (little more than 16 minutes) in the kernel (pretty sure 
> the default value is 7200 seconds or about 2 hours) 
>
> A way you could test this assertion is to collect two sample lists using 
> mco ping 
>
> mco ping -nodes=bla > /tmp/ListA
> wait 5 minutes
> mco ping -nodes=bla > /tmp/ListB
>
> check what is NOT in List B from List A and try to mco ping that host 
> direct....see if it still does not respond. If it still does not respond go 
> onto the host and restart the mcollective service (agent) then try mco ping 
> to that host again. If it works the second time that's a reasonable 
> indication that the long running tcp connection is being truncated by a fw 
> of some sort. 
>
> hope that helps
>
>
>
>
> On Friday, March 3, 2017 at 11:29:41 PM UTC-5, [email protected] wrote:
>>
>> Hi,
>>
>> I want to use mcollective on geographically and attempted to configure 
>> ActiveMQ network of brokers for that. After the configuration everything is 
>> working fine as expected but only problem is, when I run any mco query for 
>> multiple servers then its getting disconnected for sometime when many of 
>> the not responded. 
>>
>> For example, I have connected 1000 clients for each location of ActiveMQ 
>> broker, and am deploying mcollective on many servers and I have the entire 
>> list of servers(458) which am deploying it. So when I run mco ping or any 
>> mco queries with --nodes=<server_list_file> geographically am getting 
>> response from 243 servers which means mcollective configured successfully 
>> on these servers and no response from other servers which means mcollective 
>> not configured yet on those servers. But the problem is here many servers 
>> are not responded for mcollective and due to this when I run the mco query 
>> on connected servers am not getting response. But If I leave for few 
>> minutes and then run the same query against working servers then am getting 
>> response. Am suspecting that when I run mco query and many servers are not 
>> responded then middleware takes time to clear pending queues or something 
>> happening. Any idea to tune this?
>>
>>
>> Regards
>> Ravi
>>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"mcollective-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to