>>> On 9/17/2007 at 9:23 PM, in message <[EMAIL PROTECTED]>, Mike Walker <[EMAIL PROTECTED]> wrote: > Bernard, > No go. This doesn't have the patch that I sent to work the OSX > issues in gmetad. It does have the suggestion by Brad, of putting > an if statement in the read loop to test for the POLLUP. However, > from the previous beta (3.0.5 on ~ Sept 10th) testing cycle and my > email response back to the list after that beta, his suggestion > doesn't work on OSX. > > The reason is that the KERNAL is done reading off the socket and sets > the POLLUP flag BEFORE gmetad finishes reading the entire buffer. > Thus, by breaking out of the read loop before the entire buffer is > read, we get an incomplete message, and thus the messages are > discarded by the XML parser. The discarded messages results in > incorrect display in the ganglia PHP, by stating that machines are > down, gaps in monitoring, etc. >
I am sure that you are correct, so help me understand what is going on here. From what I could get from Google searches, different platforms indicate an EOF in different ways. Some set just POLLIN and then indicate EOF by checking bytes_read == 0 after a read(). In this case an revents of POLLHUP only indicates a broken connection. However other platforms send a POLLIN | POLLHUP with the POLLHUP indicating the EOF. In this way an extra read() looking for byte_read==0 would be unnecessary. A final read() can be done and EOF determined all in the same operation. In the data_thread.c code as it was originally, a POLLIN with bytes_read==0 would have functioned as expected. But a POLLIN | POLLHUP with bytes_read==<anything> would have resulted in aborting the connection all together without processing any of the data that had already be read. By adding a check for POLLHUP within the POLLIN handling, aborting the connection is avoided and the data is processed normally. Are you saying that even if POLLIN | POLLHUP is received and all of the data is read from the socket, there is still more data on the socket and a subsequent read must still be done until bytes_read==0? I guess the Curl guy just decided to treat POLLIN == POLLHUP. Does that seem safe for all platforms? If my assumptions are incorrect, which it looks like they are, then it seems to me that going back to your original patch would be the best solution. Thoughts? Brad ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
