guys-
there are two ways to solve this problem:
(simple) make all XML ports blocking (with a timeout)
(hard) see below
important: we cannot have the XML port put gmond to sleep when EAGAIN
is returned. that will make data collection/announcements stop until
the remote client is complete serviced or timed out. all of gmond
will be paused.
the way to fix this problem correctly is to look at the
poll_listen_channels() and the process_tcp_accept_channel() functions
in gmond.c.
in gmond there is a apr_pollset_t *listen_channels global variable.
this pollset contains an entry for each "tcp_accept_channel" and each
"udp_recv_channel" that are defined in the gmond.conf. the "enum
Ganglia_channel_types" defines all the types of channels in this
pollset (currently TCP_ACCCEPT_CHANNEL and UDP_RECV_CHANNEL).
the "hard" solution in steps (i know how much steve wagner loves
lists)...
1. add TCP_CLIENT_CHANNEL to the list of channel types in
Ganglia_channel_types enum
2. modify the process_tcp_accept_channel() function.
3a. instead of writting out the data to the socket
immediately, write is to memory allocated with a memory pool
3b. create a new TCP_CLIENT_CHANNEL and add it to the
listen_channels pollset.
3. in the poll_listen_channels() function we need to add case for
TCP_CLIENT_CHANNEL to handle when we sending to client will not block
(or when a client disconnects)
4. whenever a TCP_CLIENT_CHANNEL event occurs try to write to the
remote client. on disconnect (or permanent failure), release all the
memory in the client's memory pool and remove the channel from the
pollset
that is the "hard" solution. the big drawback of the hard solution
is that it will be more memory intensive (since we write the data to
memory before we start writing to the client). why do we need to do
that you ask? the data in the hash_table may have changed underneath
the client and we have no way to handle that (don't have continuation
code in gmond). having everything in memory means we just update a
"int bytesWritten" attribute in the client structure. when
bytesWritten == bytesTotal, we close the client connection and cleanup.
the "simple" approach has the downside that a client can hold up
gmond for some period of time. the "hard" approach will make gmond
more memory intensive.
of course, we (over time) can implement both and let the
configuration decide. i can update the code sometime this weekend to
support the "easy" solution and open the debate for the "hard" one.
--
[EMAIL PROTECTED]
http://massie.us