Re: [Ganglia-general] Configuration problem after failover

2015-03-26 Thread Loris Bennett
Rick Cobb rick_c...@ieee.org writes:

 Generally what you do is have all the compute nodes send to a gmond
 server on the administrative nodes, and then have gmetad poll that
 gmond. You use a unicast setup on the compute node gmonds to do this
 (IIRC, you can have them send to more than one for redundancy); you
 may as well make them deaf when you do that.

 If you want the administrative nodes to appear in a separate cluster,
 run more than one gmond on them. One listens on the compute gmond port
 and is mute, the other one listens on the administrative gmond port
 (which you'll have to assign differently than 8648 and 8649) and is
 normal (neither deaf nor mute).

 -- ReC

Thanks for all the help.  I have now got this working.  The only thing
that isn't quite correct is that one of the admin nodes is listed along
with the compute nodes, although no data has been collected here
(however, the data *is* collected in the admin node data source).

So is there still something wrong with my configuration or is this just
an artefact caused by a past misconfiguration, which has left an entry
for the admin node in the RRD files for the compute nodes?

Cheers,

Loris

-- 
This signature is currently under construction.


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-26 Thread Rick Cobb
Well, I should defer to folks who have actually configured this stuff in
the last year or three, but IIRC, this could happen because the admin node
gmond that is acting as the collector for all the compute nodes is
reporting statistics to itself.  Its other gmond is also doing that.

The configuration file for the compute collector gmond should be set up
not to collect any statistics on the local node. IIRC, that may require a
bit more work than just making it mute; e.g., you may have to remove any
parts that identify what to collect on that machine.

OTOH, if you haven't restarted the main admin gmetad (or the compute
collector gmond), then your hypothesis about stale RRD files might be
right.  If you don't care about what's been collected so far, you can try
just removing the files for those nodes within that cluster's directory and
restarting gmetad.

-- ReC

On Thu, Mar 26, 2015 at 4:11 AM, Loris Bennett loris.benn...@fu-berlin.de
wrote:

 Rick Cobb rick_c...@ieee.org writes:

  Generally what you do is have all the compute nodes send to a gmond
  server on the administrative nodes, and then have gmetad poll that
  gmond. You use a unicast setup on the compute node gmonds to do this
  (IIRC, you can have them send to more than one for redundancy); you
  may as well make them deaf when you do that.
 
  If you want the administrative nodes to appear in a separate cluster,
  run more than one gmond on them. One listens on the compute gmond port
  and is mute, the other one listens on the administrative gmond port
  (which you'll have to assign differently than 8648 and 8649) and is
  normal (neither deaf nor mute).
 
  -- ReC

 Thanks for all the help.  I have now got this working.  The only thing
 that isn't quite correct is that one of the admin nodes is listed along
 with the compute nodes, although no data has been collected here
 (however, the data *is* collected in the admin node data source).

 So is there still something wrong with my configuration or is this just
 an artefact caused by a past misconfiguration, which has left an entry
 for the admin node in the RRD files for the compute nodes?

 Cheers,

 Loris

 --
 This signature is currently under construction.



 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-24 Thread Loris Bennett
 On 03/20/2015 10:23 AM, Loris Bennett wrote:
 Hi,

 I have the following in my gmetad.conf

 data_source Admin_Nodes 10 admin:8648
 data_source Compute_Nodes 10 admin:8649

 and when I look at the ports in use, I have

 $ netstat -plane | egrep 'gmon|gme'
 tcp0  0 0.0.0.0:86510.0.0.0:*   
 LISTEN  493256095111  62544/gmetad
 tcp0  0 0.0.0.0:86520.0.0.0:*   
 LISTEN  493256095112  62544/gmetad
 unix  2  [ ] DGRAM256095117 62544/gmetad

 Should I expect to see gmetad listening on ports 8648 and 8649 as well?

 Cheers,

 Loris

Vladimir Vuksan vli...@veus.hr writes:

 No. Gmetad listens to two ports by default

 8651 and 8652

 8648 and 8649 are ports for the gmond which gmetad is polling.


OK, I think I have a general problem will my setup.  I have:

- 3 admin nodes, which during normal operation are always up
- 100 compute nodes, any or all of which could be powered down during
  normal operation

Setting up the data source for the admin nodes seems straight forward,
as they are normally all up.  However, how should it be defined for the
compute nodes?  I would like to do something like

  data_source Compute_Nodes 10 node*.test.cluster:8649

but this produces the error:

  we failed to resolve data source name node*.test.cluster

I could add one of the admin nodes to the cluster of compute nodes, but
then it would no longer be able to seed its own data to the cluster of
admin node.

Is there a standard way of dealing with this case?

Cheers,

Loris

-- 
This signature is currently under construction.


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-24 Thread Rick Cobb
Generally what you do is have all the compute nodes send to a gmond server
on the administrative nodes, and then have gmetad poll that gmond.  You use
a unicast setup on the compute node gmonds to do this (IIRC, you can have
them send to more than one for redundancy); you may as well make them
deaf when you do that.

If you want the administrative nodes to appear in a separate cluster, run
more than one gmond on them. One listens on the compute gmond port and is
mute, the other one listens on the administrative gmond port (which
you'll have to assign differently than 8648 and 8649) and is normal
(neither deaf nor mute).

-- ReC

On Tue, Mar 24, 2015 at 2:10 AM, Loris Bennett loris.benn...@fu-berlin.de
wrote:

  On 03/20/2015 10:23 AM, Loris Bennett wrote:
  Hi,
 
  I have the following in my gmetad.conf
 
  data_source Admin_Nodes 10 admin:8648
  data_source Compute_Nodes 10 admin:8649
 
  and when I look at the ports in use, I have
 
  $ netstat -plane | egrep 'gmon|gme'
  tcp0  0 0.0.0.0:86510.0.0.0:*
  LISTEN  493256095111  62544/gmetad
  tcp0  0 0.0.0.0:86520.0.0.0:*
  LISTEN  493256095112  62544/gmetad
  unix  2  [ ] DGRAM256095117 62544/gmetad
 
  Should I expect to see gmetad listening on ports 8648 and 8649 as well?
 
  Cheers,
 
  Loris
 
 Vladimir Vuksan vli...@veus.hr writes:

  No. Gmetad listens to two ports by default
 
  8651 and 8652
 
  8648 and 8649 are ports for the gmond which gmetad is polling.
 

 OK, I think I have a general problem will my setup.  I have:

 - 3 admin nodes, which during normal operation are always up
 - 100 compute nodes, any or all of which could be powered down during
   normal operation

 Setting up the data source for the admin nodes seems straight forward,
 as they are normally all up.  However, how should it be defined for the
 compute nodes?  I would like to do something like

   data_source Compute_Nodes 10 node*.test.cluster:8649

 but this produces the error:

   we failed to resolve data source name node*.test.cluster

 I could add one of the admin nodes to the cluster of compute nodes, but
 then it would no longer be able to seed its own data to the cluster of
 admin node.

 Is there a standard way of dealing with this case?

 Cheers,

 Loris

 --
 This signature is currently under construction.



 --
 Dive into the World of Parallel Programming The Go Parallel Website,
 sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for
 all
 things parallel software development, from weekly thought leadership blogs
 to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-20 Thread Vladimir Vuksan
No. Gmetad listens to two ports by default

8651 and 8652

8648 and 8649 are ports for the gmond which gmetad is polling.


On 03/20/2015 10:23 AM, Loris Bennett wrote:
 Hi,

 I have the following in my gmetad.conf

 data_source Admin_Nodes 10 admin:8648
 data_source Compute_Nodes 10 admin:8649

 and when I look at the ports in use, I have

 $ netstat -plane | egrep 'gmon|gme'
 tcp0  0 0.0.0.0:86510.0.0.0:*   
 LISTEN  493256095111  62544/gmetad
 tcp0  0 0.0.0.0:86520.0.0.0:*   
 LISTEN  493256095112  62544/gmetad
 unix  2  [ ] DGRAM256095117 62544/gmetad

 Should I expect to see gmetad listening on ports 8648 and 8649 as well?

 Cheers,

 Loris



--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general