guys-

there are two ways to solve this problem:
    (simple) make all XML ports blocking (with a timeout)
    (hard) see below

important: we cannot have the XML port put gmond to sleep when EAGAIN is returned. that will make data collection/announcements stop until the remote client is complete serviced or timed out. all of gmond will be paused.

the way to fix this problem correctly is to look at the poll_listen_channels() and the process_tcp_accept_channel() functions in gmond.c.

in gmond there is a apr_pollset_t *listen_channels global variable. this pollset contains an entry for each "tcp_accept_channel" and each "udp_recv_channel" that are defined in the gmond.conf. the "enum Ganglia_channel_types" defines all the types of channels in this pollset (currently TCP_ACCCEPT_CHANNEL and UDP_RECV_CHANNEL).

the "hard" solution in steps (i know how much steve wagner loves lists)...

1. add TCP_CLIENT_CHANNEL to the list of channel types in Ganglia_channel_types enum
2. modify the process_tcp_accept_channel() function.
3a. instead of writting out the data to the socket immediately, write is to memory allocated with a memory pool 3b. create a new TCP_CLIENT_CHANNEL and add it to the listen_channels pollset. 3. in the poll_listen_channels() function we need to add case for TCP_CLIENT_CHANNEL to handle when we sending to client will not block (or when a client disconnects) 4. whenever a TCP_CLIENT_CHANNEL event occurs try to write to the remote client. on disconnect (or permanent failure), release all the memory in the client's memory pool and remove the channel from the pollset

that is the "hard" solution. the big drawback of the hard solution is that it will be more memory intensive (since we write the data to memory before we start writing to the client). why do we need to do that you ask? the data in the hash_table may have changed underneath the client and we have no way to handle that (don't have continuation code in gmond). having everything in memory means we just update a "int bytesWritten" attribute in the client structure. when bytesWritten == bytesTotal, we close the client connection and cleanup.

the "simple" approach has the downside that a client can hold up gmond for some period of time. the "hard" approach will make gmond more memory intensive.

of course, we (over time) can implement both and let the configuration decide. i can update the code sometime this weekend to support the "easy" solution and open the debate for the "hard" one.

--
[EMAIL PROTECTED]
  http://massie.us




Reply via email to