Thanks Kostas and Jonathan for your suggestions.

I spent a quite a few hours on this and in the end decided that the
gmetad was working as designed and that adding a specific timeout on a
socket connection wasn't needed.

This is because the kernel already times out socket connections that
fail, or rather it times failures out and then retries several times
until it finally gives up. The data collection thread then sleeps for
a bit before trying again.

My specific problem was that after sleeping the data thread was just
retrying the same host it failed on last time which was the instance
that had been terminated. This would inevitably fail at some point and
the data thread would appear to hang.

The solution was to modify gmetad to poll the most recently launched
instance by looking at the GMOND_STARTED value which works well.

Hopefully I'll find time to submit this code in a branch in the coming
days/weeks.

--Nick.

On Tue, Feb 5, 2013 at 5:28 PM, Kostas Georgiou
<k.georg...@atreides.org.uk> wrote:
> On Fri, Jan 25, 2013 at 12:45:10PM +0000, Nicholas Satterly wrote:
>
>> Does anyone have any ideas of how the connection could at least be
>> timed out? Keep in mind that the gmetad is multi-threaded so I'm
>> pretty sure that rules out the use of SIGALRM.
> ..,
>> How could a 2 second timeout be enforced on this connect()?
>
> You set O_NONBLOCK on the socket before the connect, run select
> with a 2 sec timeout on the socket from there if you have a connection
> (depending on if select hit the timeout or not and what getsockopt for
> SO_ERROR returns) you set the socket back to blocking.
>
> Did you see any failures when the machine went away after the connect?
> I can't remember if we timeout while we are reading data from the
> scoket.
>
> ------------------------------------------------------------------------------
> Free Next-Gen Firewall Hardware Offer
> Buy your Sophos next-gen firewall before the end March 2013
> and get the hardware for free! Learn more.
> http://p.sf.net/sfu/sophos-d2d-feb
> _______________________________________________
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers



-- 
gpg: using PGP trust model
pub   4096R/1EE38BD9 2013-01-06 [expires: 2018-01-06]
      Key fingerprint = 3EE9 550D D9D8 DB65 58C2  B58D CE78 EC6C 1EE3 8BD9
uid                  Nicholas Satterly (Debian Key) <nfsatte...@gmail.com>
sub   4096R/23804EE9 2013-01-06 [expires: 2018-01-06]

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to