i've send a message to the sysadmin for the bugzilla box. sorry for the
downtime... shouldn't take too long to resolve.
-matt
On Wed, 2007-03-28 at 13:12 -0600, Ian Cunningham wrote:
> Timothy,
>
> I have filed a bug for this exact behavior. I would direct you to attach
> this patch to the bug, however bugzilla seems to be down still. If the
> bugzilla admin is on list, please reply or fix bugzilla.
>
> Thanks,
> Ian
>
> Witham, Timothy D wrote:
> > Hi,
> >
> > I just had a situation where the first host in a gmetad data_source
> > accepts the connection but offers no data, like this:
> >
> > poll() timeout for [clustername] data source after 0 bytes read
> >
> > Gmetad always tries the sources in order and so it just keeps getting
> > stuck on this first one, and losing the data for the entire cluster.
> >
> > Here is a quick patch that tries random hosts from the list instead,
> > and solved my problem. It is not careful to make sure it tried them
> > all, but if it fails it will just try again next time. If someone
> > wants to fix it to try all the sources in a random order, that would
> > be fine. Perhaps this could be included in the next release unless
> > someone knows a good reason to always try the sources in order.
> >
> > Thanks!
> >
> > -8<---------------------------------------------------------
> > diff -c -r1.1.1.1 data_thread.c
> > *** data_thread.c 19 Mar 2007 18:52:32 -0000 1.1.1.1
> > --- data_thread.c 28 Mar 2007 18:12:08 -0000
> > ***************
> > *** 18,24 ****
> > void *
> > data_thread ( void *arg )
> > {
> > ! int i, sleep_time, bytes_read, rval;
> > data_source_list_t *d = (data_source_list_t *)arg;
> > g_inet_addr *addr;
> > g_tcp_socket *sock=0;
> > --- 18,24 ----
> > void *
> > data_thread ( void *arg )
> > {
> > ! int i, j, sleep_time, bytes_read, rval;
> > data_source_list_t *d = (data_source_list_t *)arg;
> > g_inet_addr *addr;
> > g_tcp_socket *sock=0;
> > ***************
> > *** 60,75 ****
> > if(d->last_good_index >= 0)
> > sock = g_tcp_socket_new ( d->sources[d->last_good_index] );
> >
> > ! /* If there was no good connection last time or the above
> > connect failed then try each host in the list. */
> > if(!sock)
> > {
> > ! for(i=0; i < d->num_sources; i++)
> > {
> > ! /* Find first viable source in list. */
> > ! sock = g_tcp_socket_new ( d->sources[i] );
> > if( sock )
> > {
> > ! d->last_good_index = i;
> > break;
> > }
> > }
> > --- 60,80 ----
> > if(d->last_good_index >= 0)
> > sock = g_tcp_socket_new ( d->sources[d->last_good_index] );
> >
> > ! /* If there was no good connection last time or the above
> > ! connect failed then try random hosts in the list. We try
> > ! random ones in case someone is accepting the connection
> > ! but refusing to provide any data; we don't want to get
> > ! stuck with a non-working host. */
> > if(!sock)
> > {
> > ! for(i=0; i < d->num_sources * 2; i++)
> > {
> > ! /* Find random viable source in list. */
> > ! j = d->num_sources * (rand() / (RAND_MAX - 1.0));
> > ! sock = g_tcp_socket_new ( d->sources[j] );
> > if( sock )
> > {
> > ! d->last_good_index = j;
> > break;
> > }
> > }
> > -8<----------------------------------------------------------
> >
> >
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general