Daniel Pocock wrote:
> 
> I've observed a few DNS related issues recently.  Currently, gmetad uses 
> the host names returned in the XML from gmond to create and locate RRDs 
> for each host.
> 
> Some of the problems and possible ideas:
> 
> - capitalisation is inconsistent and can even change, RFC specifies that 
> it is not important: maybe gmetad should convert everything it receives 
> to lower case?
> 
> - hosts can move amongst domains, or be reachable under multiple domains 
> in some weird setups: maybe there needs to be an option to tell gmetad 
> to drop the domain and just assume that host names are globally unique?
> 
> - if DNS is unavailable when gmond comes up, it starts recording the 
> received metrics using IP addresses instead of host names, and the IP 
> addresses become `stuck' in gmond: maybe it needs to keep retrying the 
> DNS name, rather than becoming stuck on the IP address?
> 
> - maybe use UUIDs instead of hostnames?  The UUID could be generated by 
> `gmond -t' and stored in each gmond.conf.  gmetad would create a 
> directory for each UUID, and maybe a symlink from the hostname for 
> convenience.
> 

I think it's better to use IP to store RRDs. The host names can be resolved
by webfrontend and cached. The cache can be refreshed by removing old names,
or edit by hands. For server farms which use kerberos, dns reverse lookup
could returns uninteresting and unfriendly formatted names; in this situation,
the ip to hostname translation can be delegated to a plugin or something like,
which reads a database or a flat file.

In our server farm, hosts are changing roles now and then, and moved among
clusters. Every time we move hosts among clusters, we get "dead" hosts in
the old cluster and they're alive in other cluster. Restarting gmetad is
really a bad option, you know. So the current "store by cluster and hostname"
is not convenient. Ok, I know it's convenient for the situation that some other
host B takes over a name that host A once used; this is service oriented.

I think UUID is not a good idea, especially considering the OSes are
reinstalled but roles ramain. We don't need a forever dead host which actually
comes back to life.

The main problem is the "view". Cacti in this case has more potential. We
created a view system to make use of its graph.php, and works great. Cacti's
database backend makes our tweak easier (but database is cacti's strong point
and weak point at the same time).

Here we have plan to create a "view" system to utilize ganglia webfront's
graph.php, but the main problem is a host can appear in multiple clusters'
RRD directories. Which rrd should be used is really hard to decide. Maybe
we should use "unspecified" cluster for our thousands of nodes, but then
ganglia-webfrontend is useless.

BTW, we now use IP-based reverse dns name ;) that equals to IP.

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to