[Ganglia-developers] EAGAIN (again)

matt massie Sat, 18 Mar 2006 12:04:44 -0800

guys-

there are two ways to solve this problem:
    (simple) make all XML ports blocking (with a timeout)
    (hard) see below

important: we cannot have the XML port put gmond to sleep when EAGAINis returned. that will make data collection/announcements stop untilthe remote client is complete serviced or timed out. all of gmondwill be paused.

the way to fix this problem correctly is to look at thepoll_listen_channels() and the process_tcp_accept_channel() functionsin gmond.c.

in gmond there is a apr_pollset_t *listen_channels global variable.this pollset contains an entry for each "tcp_accept_channel" and each"udp_recv_channel" that are defined in the gmond.conf. the "enumGanglia_channel_types" defines all the types of channels in thispollset (currently TCP_ACCCEPT_CHANNEL and UDP_RECV_CHANNEL).

the "hard" solution in steps (i know how much steve wagner loveslists)...

1. add TCP_CLIENT_CHANNEL to the list of channel types inGanglia_channel_types enum

2. modify the process_tcp_accept_channel() function.

3a. instead of writting out the data to the socketimmediately, write is to memory allocated with a memory pool3b. create a new TCP_CLIENT_CHANNEL and add it to thelisten_channels pollset.3. in the poll_listen_channels() function we need to add case forTCP_CLIENT_CHANNEL to handle when we sending to client will not block(or when a client disconnects)4. whenever a TCP_CLIENT_CHANNEL event occurs try to write to theremote client. on disconnect (or permanent failure), release all thememory in the client's memory pool and remove the channel from thepollset

that is the "hard" solution. the big drawback of the hard solutionis that it will be more memory intensive (since we write the data tomemory before we start writing to the client). why do we need to dothat you ask? the data in the hash_table may have changed underneaththe client and we have no way to handle that (don't have continuationcode in gmond). having everything in memory means we just update a"int bytesWritten" attribute in the client structure. whenbytesWritten == bytesTotal, we close the client connection and cleanup.

the "simple" approach has the downside that a client can hold upgmond for some period of time. the "hard" approach will make gmondmore memory intensive.

of course, we (over time) can implement both and let theconfiguration decide. i can update the code sometime this weekend tosupport the "easy" solution and open the debate for the "hard" one.


--
[EMAIL PROTECTED]
  http://massie.us

[Ganglia-developers] EAGAIN (again)

Reply via email to