Sorry it is happening on 3.0.4.  It was straight forward and simple  
to compile (using the GCC 4.0.1 on OSX 10.4.10).   Searched for  
another way to determine if the connection was lost.  The only way I  
could find was if a < 0 on the read...which is already being trapped.

In searching the newsgroups for FreeBSD and OSX, it all points to not  
using POLLHUP on socket connections.  A quick look at the Darwin  
source code indicates that reaching the end of the buffer sets the  
POLLHUP.  Thus, unless I hit it just right, (like the kernel is still  
reading off the socket during a poll() ), POLLHUP would always be set  
when data is available for reading...thus POLLIN and POLLHUP is  
(almost) always being set after the poll().

Since OSX Unix heritage comes from FreeBSD...I would think others  
would be having this issue also?  Or does FreeBSD never sets this on  
a Socket? Or?

**** from xnu-792.13.8/bsd/kern/sys_generic.c

static int
poll_callback(__unused struct kqueue *kq, struct kevent *kevp, void  
*data)
{
        struct poll_continue_args *cont = (struct poll_continue_args *)data;
        struct pollfd *fds = CAST_DOWN(struct pollfd *, kevp->udata);
        short mask;

        /* convert the results back into revents */
        if (kevp->flags & EV_EOF)
                fds->revents |= POLLHUP;
        if (kevp->flags & EV_ERROR)
                fds->revents |= POLLERR;
        cont->pca_rfds++;

******* xnu-792.13.8/bsd/kern/sys_generic.c

Thus the patch file is simply......

********************
--- data_thread.c.org   2007-08-26 15:28:18.000000000 -0700
+++ data_thread.c       2007-08-26 15:29:22.000000000 -0700
@@ -131,12 +131,14 @@
                                }
                             read_index+= bytes_read;
                          }
+        #if !(defined(DARWIN)) /*Appears that OSX uses POLLHUP on  
Sockets that I have loaded the entire message into the buffer...not  
that I lost the connection (See FreeBSD li
sts on this discussion) */
                       if( struct_poll.revents & POLLHUP )
                          {
                             err_msg("The remote machine closed  
connection for [%s] data source after %d bytes read", d->name,  
read_index);
                             d->dead = 1;
                             goto take_a_break;
                          }
+       #endif /*DARWIN*/
                       if( struct_poll.revents & POLLERR )
                          {
                             err_msg("POLLERR! for [%s] data source  
after %d bytes read", d->name, read_index);

*********************

Mike




On Aug 24, 2007, at 4:32 PM, Bernard Li wrote:

> Hi Mike:
>
> On 8/24/07, Mike Walker <[EMAIL PROTECTED]> wrote:
>
>>         Platform:  MacOSX 10.4.10 (Intel) and will confirm same  
>> problem
>> monday on MacOSX 10.4.10 (powerPC)
>>         Problem:  The data from the all nodes/clusters where not  
>> being saved
>> (consistently and accurately) to the rdd files.  In debugging this
>> issue, it appears the problem is at line 134 of gmetad/
>> data_thread.c   if(struct_poll.revents & POLLHUP).
>>
>>         From web searches http://www.greenend.org.uk/rjk/2001/06/ 
>> poll.html &
>> http://www.osxfaq.com/man/2/poll.ws & etc.  it appears that the use
>> of POLLHUP is not universal in the definition and implementation.
>> For OSX it appears that the entire message (partially verified by
>> using ethereal) is received...even though this (POLLHUP is evaluated
>> to true).  With the logic in the 'if' statement, the entire message
>> is discarded, and thus almost (if not all) data is never saved to the
>> rrd files and all hosts are reported either not existing or down.
>>
>>         By commenting out this if statement (line 134 block),  
>> gmetad works
>> wonderfully. (since all data is being received)
>
> Perhaps simply checking POLLHUP is not sufficient.  We might need to
> check something else before we discard the message -- think you can
> write a patch?
>
>>         In searching the ganglia mailing lists, I couldn't find
>> anything....  is this known?  or is this an issue (and thus I need to
>> open a bug report)?
>
> If you have searched through the bug reports and didn't see it filed
> already, please do:
>
> http://bugzilla.ganglia.info
>
> P.S. Was it difficult to compile this under OSX?  Which version of
> Ganglia are you using?  3.0.4?
>
> Thanks,
>
> Bernard


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to