Hi Jeff,

it's best if you can submit a pull request for this issue.

Thank  you,
Vladimir


05/31/2018 u 12:24 PM, Jeffrey Frey je napisao/la:
Background
==========

On a new cluster we are building right now I moved from Ganglia 3.6.1 to 3.7.2.  3.6.1 has been rock-solid on previous clusters.  After 3.7.2 gmond has been up for a short period of time, it begins emitting the error message:


Incorrect format for spoof argument. exiting.



Debugging
=========

If I enable debugging (e.g. -d 4) I'm shown the parsed contents of the spoof string -- and they are non-zero garbage strings.  Doing some gdb tracing with breakpoints on that error message, the metric_id passed to the function has non-zero .spoof and the .host value is a garbage string.


In one trace, the .host was an empty string (""); the code in Ganglia_host_get() assumes that if .spoof is non-zero, then .host is non-null and a string with length > 0.  So the subsequent code:


      spoof_info_len = strlen(metric_id->host);
      buff = malloc(spoof_info_len+1);
      strncpy(buff, metric_id->host, spoof_info_len + 1);
      spoofIP = buff;
      if( !(spoofName = strchr(buff+1,':')) ){


can produce a buffer overrun for a zero-length string.


To isolate possible reasons for the botched spoofing hostname I compared the gmond/gmond.c source between 3.6.1 and 3.7.2.  In Ganglia_collection_group_send() the following code


            name = cb->msg.Ganglia_value_msg_u.gstr.metric_id.name;
            if (override_hostname != NULL)
              {
                cb->msg.Ganglia_value_msg_u.gstr.metric_id.host = apr_pstrcat(gm_pool, (char *)( override_ip != NULL ? override_ip : override_hostname ), ":", (char *) override_hostname, NULL);
                cb->msg.Ganglia_value_msg_u.gstr.metric_id.spoof = TRUE;
              }


is allocating the callback's .host field from the temporary metrics APR pool; but the callback is external to this function and lives on beyond the destruction of that temporary APR pool.  Eventually the memory behind cb->msg.Ganglia_value_msg_u.gstr.metric_id.host will be reused and overwritten, yielding the "garbage string" condition that's being observed.  In 3.6.1, the .host field was allocated from global_context.  If I modified the code cited above to use global_context rather than gm_pool, gmond runs without throwing "Incorrect format for spoof argument" errors.


Also, in lib/libgmond.c the static global "myhost"


static char myhost[APRMAXHOSTLEN+1];


is assumed by the rest of the code to have been initialized by the compiler to be a zero-length string:


  if (myhost[0] == '\0')
      apr_gethostname( (char*)myhost, APRMAXHOSTLEN+1, gm_pool);


Probably best to be explicit about the initial value of myhost and not assume an initial value?


static char myhost[APRMAXHOSTLEN+1] = "";


Happy to contribute patch files, etc.




::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::






------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to