>>> On 9/17/2007 at 9:23 PM, in message
<[EMAIL PROTECTED]>, Mike Walker
<[EMAIL PROTECTED]> wrote:
> Bernard,
>       No go.  This doesn't have the patch that I sent to work the OSX  
> issues in gmetad.  It does have the suggestion by Brad,  of putting  
> an if statement in the read loop to test for the POLLUP.  However,  
> from the previous beta (3.0.5  on ~ Sept 10th) testing cycle and my  
> email response back to the list after that beta, his suggestion  
> doesn't work on OSX.
> 
> The reason is that the KERNAL is done reading off the socket and sets  
> the POLLUP flag BEFORE gmetad finishes reading the entire buffer.   
> Thus, by breaking out of the read loop before the entire buffer is  
> read, we get an incomplete message, and thus the messages are  
> discarded by the XML parser.   The discarded messages  results in  
> incorrect display in the ganglia PHP, by stating that machines are  
> down, gaps in monitoring, etc.
> 

   I am sure that you are correct, so help me understand what is going on here. 
 From what I could get from Google searches, different platforms indicate an 
EOF in different ways.  Some set just POLLIN and then indicate EOF by checking 
bytes_read == 0 after a read().  In this case an revents of POLLHUP only 
indicates a broken connection.  However other platforms send a POLLIN | POLLHUP 
with the POLLHUP indicating the EOF.  In this way an extra read() looking for 
byte_read==0 would be unnecessary.  A final read() can be done and EOF 
determined all in the same operation.  In the data_thread.c code as it was 
originally, a POLLIN with bytes_read==0 would have functioned as expected.  But 
a POLLIN | POLLHUP with bytes_read==<anything> would have resulted in aborting 
the connection all together without processing any of the data that had already 
be read.  By adding a check for POLLHUP within the POLLIN handling, aborting 
the connection is avoided and the data is processed normally.
   Are you saying that even if POLLIN | POLLHUP is received and all of the data 
is read from the socket, there is still more data on the socket and a 
subsequent read must still be done until bytes_read==0?  I guess the Curl guy 
just decided to treat POLLIN == POLLHUP.  Does that seem safe for all 
platforms?  If my assumptions are incorrect, which it looks like they are, then 
it seems to me that going back to your original patch would be the best 
solution.  Thoughts?

Brad


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to