Hi Michael,

the fix for a overrun looks good to me. But your code still has the
problem that loop back traffic is counted, too.
perfstat_netinterface_total is the sum of all network devices including
 lo0, etc.

Best regards
   Andreas




Michael Perzl schrieb:
>  Andreas,
> 
> thank you for taking the blame but you are off the hook here.  ;-)
> 
> If I understood David correctly, he is using my AIX Ganglia RPM packages
> with POWER5 extensions. Here most if not all implementation of how the
> metrics are collected under AIX have been changed. Everything is
> documented on my homepage (http://www.perzl.org/ganglia/) though.
> So everything what goes wrong here is entiremy my fault :-[
> 
> After some investigating and some discussions with Nigel I have come to
> terms with the following facts regarding the bytes_in/bytes_out problem:
> - libperfstat (the library on AIX which obtains all the system
> performance data) uses u_longlong_t data types (these are definitely
> 64-bit large).
> - The AIX kernel internally, though, may probably not be using 64-bit
> data types - more realistic is probably unsigned 32-bit - in order not
> to break compatibility (my personal opinion)
> - The consequence now is that integer overrun may occur much easier with
> 32-bit data types than with 64-bit data types (we all probably don't
> live long enough to see that happen).
> 
> Please take a look at my implementation of the bytes_in metric (the
> bytes_out implementation is accordingly):
> 
> 01  g_val_t
> 02  bytes_in_func( void )
> 03  {
> 04     g_val_t val;
> 05     perfstat_netinterface_total_t n;
> 06     static u_longlong_t last_bytes_in = 0, bytes_in;
> 07     static double last_time = 0.0;
> 08     double now, delta_t;
> 09     struct timeval timeValue;
> 10     struct timezone timeZone;
> 11
> 12     gettimeofday( &timeValue, &timeZone );
> 13
> 14     now = (double) (timeValue.tv_sec - boottime) + (timeValue.tv_usec
> / 1000000.0);
> 15
> 16     if (perfstat_netinterface_total( NULL, &n, sizeof(
> perfstat_netinterface_total_t ), 1 ) == -1)
> 17        val.f = 0.0;
> 18     else
> 19     {
> 20        bytes_in = n.ibytes;
> 21
> 22        delta_t = now - last_time;
> 23
> 24        if ( delta_t )
> 25           val.f = (double) (bytes_in - last_bytes_in) / delta_t;
> 26        else
> 27           val.f = 0.0;
> 28
> 29        last_bytes_in = bytes_in;
> 30     }
> 31
> 32     last_time = now;
> 33
> 34     return( val );
> 35  }
> 
> In my opinion the overrun occurs in line #25 when "bytes_in <
> last_bytes_in".
> In my naivity I had assumed as both are of type u_longlong_t that an
> integer overrun might never happen.
> 
> So to solve the overrun a check for "bytes_in < last_bytes_in" must be
> introduced, something like:
> 
> u_longlong_t d;
> d = bytes_in - last_bytes_in;
> if (d < 0) d += ULONG_MAX;
> 
> and line #25 would essentially become
> 25           val.f = (double) d / delta_t;
> 
> Comments ?
> 
> Regards,
> Michael
> 
> PS: David, the reason why you don't see it happen with pkts_in and
> pkts_out is that probably no overrun so far has occurred but at some
> point it will also happen.
> 
> PPS: David, if this is a solution (I want some comments on that before,
> though) then I would be building new RPMs with the then hopefully
> correct code.
> 

-- 
           Dr. Andreas Schoenfeld | Dr. Andreas Schoenfeld
                                  |
Technische Universitaet Darmstadt | Technische Universitaet Darmstadt
      Hochschulrechenzentrum (HRZ)| University Computing Centre
                                  |
               Petersenstrasse 30 | Petersenstrasse 30
                  64287 Darmstadt | 64287 Darmstadt
                                  | Germany
                                  |
              Tel.  06151-16 5608 | Tel. +49 (0) 6151-16 5608

             e-mail: [EMAIL PROTECTED]

Reply via email to