Until recently I wasn't controlling the start order of ec2-run-user-data
and ganglia-monitor, so they were starting at the same 'time'.  Yesterday I
fixed that, so that now ec2-run-user-data starts at S02 and ganglia-monitor
at S03.  I thought the issue might be exactly what you describe -
ganglia-monitor starting before ec2-run-user-data has finished altering the
gmond.conf, but the error still happened today.

Also, I suspect (but don't know for sure) that the gmond.conf will actually
be invalid before ec2-run-user-data has run - I've altered it to have flags
that get replaced with valid values.

On Thu, Nov 13, 2014 at 12:20 PM, Joe Gracyk <jgra...@marketlive.com> wrote:

> Hi, Sam -
>
> We've got a similar deployment (EC2 instances unicasting to a per-AZ
> gmetad) that we're managing with Puppet, and I can't say we've seen
> anything like that.
>
> How are you automating your redeployments and gmond configurations? Could
> your gmond instances be starting up before their unicast configurations
> have been applied? If you had some sort of race condition where gmond could
> be installed and started, and *then *getting the conf file written, I'd
> expect gmond to merrily chug along, fruitlessly trying to multicast into
> the void.
>
> Good luck!
>
> On Wed, Nov 12, 2014 at 2:41 PM, Sam Barham <s.bar...@adinstruments.com>
> wrote:
>
>> We've got about 100 machines running on AWS EC2s, with Ganglia for
>> monitoring.  Because we are on Amazon, we can't use multicast, so the
>> architecture we have is each cluster has a Bastion machine, and each other
>> machine in the cluster has gmond send its' data to the bastion, which
>> gmetad then queries.  All standard and sensible and it works just fine.
>>
>> Except that occasionally, when I redeploy the machines in a cluster (but
>> not the bastion - that stays running through this operation), just one of
>> the machines will not send data through to the bastion or something.  All I
>> can say for sure is that gmond is running OK on the problem machine, there
>> are no error logs on the problem machine, the bastion or the gmetad
>> machine, but the machine doesn't appear in gmetad.  If I go into the
>> problem machine and restart gmond, it reconnects just fine and appears in
>> gmetad.
>>
>> Which machine has the error is random - it's not a particular type of
>> machine or anything.  Because the error only shows up rarely, and only at
>> deployment time, I can't really turn on debug_level to investigate.
>>
>> Also, some of the configuration values in gmond.conf are filled in when
>> the userdata is run.  I've edited /etc/init.d/ganglia-monitor so that it
>> starts up immediately after the userdata has run, just in case that matters.
>>
>> Any ideas?
>>
>> Sam
>>
>>
>> ------------------------------------------------------------------------------
>> Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push notifications.
>> Take corrective actions from your mobile device.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Ganglia-general mailing list
>> Ganglia-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>>
>>
>
>
> --
>
> [image: logo] <http://www.marketlive.com/>
>
> Joe Gracyk | *DevOps Developer*
> 707-780-1848 | jgra...@marketlive.com
>
> [image: Follow us on Facebook] <http://www.facebook.com/marketlive>
> <https://twitter.com/marketliveinc>
> <http://www.linkedin.com/company/marketlive>
> <http://www.marketlive-blog.com/> <http://www.marketlive.com/summit2015/>
>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to