Bart Van Assche wrote:
 > On Tue, Aug 4, 2009 at 4:06 PM,
 > [email protected]<[email protected]> wrote:
 >> I have an application using net-snmp 5.4.2.1 which after some time -
 >> seemingly proportional to the number of snmp requests - will terminate
 >> with a glibc double free error. The application is doing repetitive and
 >> simple SNMP_MSG_GET requests. After some time - could be a number of
 >> weeks, snmp_synch_response will return STAT_ERROR, and when
 >> snmp_close(ss) is called - the double free is detected after
 >> snmp_sess_close calls snmp_free_pdu. (full gdb output at the end).
 >
 > Did you already try to analyze the process that triggers this behavior
 > with Valgrind (http://www.valgrind.org/) ?
 >
 > Bart.

Hi,

I tried to eliminate all the valgrind errors in my code - there were 
only 5 to begin with, and 4 were trivial. The double free error is still 
being produced after the 4 minor errors were resolved.

The 1 remaining valgrind error (reproduced at the end in full) is 
complaining about

"socketcall.sendmsg(msg.msg_control) points to uninitialised byte(s)".

This occurs every time an snmp request is sent. I tested with a simple 
standard comand line snmpget - and valgrind still issues the same error. 
Using my code, I placed a breakpoint in gdb at the point it is sent 
(netsnmp_udp_send [snmpUDPDoomain.c:184]) and checked to see what it 
could be complaining about.

It looked to me like it could be complaining about an unitialised field 
cmsg.ipi.ipi_addr and possibly __cmsg_data.

Here is the pertinent code:

static int netsnmp_udp_sendto(int fd, struct in_addr *srcip, struct 
sockaddr *remote,
                        void *data, int len)
{
     struct iovec iov = { data, len };
     struct {
         struct cmsghdr cm;
         struct in_pktinfo ipi;
     } cmsg;
     struct msghdr m;

     cmsg.cm.cmsg_len = sizeof(struct cmsghdr) + sizeof(struct in_pktinfo);
     cmsg.cm.cmsg_level = SOL_IP;
     cmsg.cm.cmsg_type = IP_PKTINFO;
     cmsg.ipi.ipi_ifindex = 0;
     cmsg.ipi.ipi_spec_dst.s_addr = (srcip ? srcip->s_addr : INADDR_ANY);

     m.msg_name         = remote;
     m.msg_namelen      = sizeof(struct sockaddr_in);
     m.msg_iov          = &iov;
     m.msg_iovlen       = 1;
     m.msg_control      = &cmsg;
     m.msg_controllen   = sizeof(cmsg);
     m.msg_flags                = 0;

     return sendmsg(fd, &m, MSG_NOSIGNAL|MSG_DONTWAIT);
}


cmsghdr is declared in /usr/include/bits/socket.h:
struct cmsghdr
   {
     size_t cmsg_len;           /* Length of data in cmsg_data plus length
                                   of cmsghdr structure.
                                   !! The type should be socklen_t but the
                                   definition of the kernel is incompatible
                                   with this.  */
     int cmsg_level;            /* Originating protocol.  */
     int cmsg_type;             /* Protocol specific type.  */
#if (!defined __STRICT_ANSI__ && __GNUC__ >= 2) || __STDC_VERSION__ >= 
199901L
     __extension__ unsigned char __cmsg_data __flexarr; /* Ancillary 
data.  */
#endif
   };


ip_pktinfo is declared in /usr/include/linux/in.h:

struct in_pktinfo
{
        int             ipi_ifindex;
        struct in_addr  ipi_spec_dst;
        struct in_addr  ipi_addr;
};

I changed the code to initialise the ipi_addr by adding the following 
line to netsnmp_udp_sendto:

     cmsg.ipi.ipi_addr.s_addr = 0;

but still got the same valgrind error.

__cmsg_data appears to be a zero-sized array, since sizeof(cmsghdr) is 
16 - and once cmsg.ipi.ipi_ifindex is set, dbg shows that __cmsg_data is 
  empty.

This is what I get for cmsg after cmsg is initialised:

(gdb) n
177         m.msg_name          = remote;
(gdb) print cmsg
$7 = {cm = {cmsg_len = 28, cmsg_level = 0, cmsg_type = 8, __cmsg_data = 
0x7fffa860c370 ""}, ipi = {ipi_ifindex = 0, ipi_spec_dst = {s_addr = 0},
     ipi_addr = {s_addr = 0}}}

Did I miss anything?

The only thing I can think of is that cmsg.cm.cmsg_len is set to 28, but 
later whem m.msg_controllen is set to sizeof(cmsg) - its value is 32 - I 
assume due to some kind of 16/32bit padding for boundary alignment? 
Could this be a possible cause of the valgrind error? If not, what?

Any opinions of whether this error could be causing my glibc double free 
  errors, or whether it is a red-herring?

valgrind error to follow.

Many thanks in advance,
Craig



==545== Memcheck, a memory error detector.
==545== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
==545== Using LibVEX rev 1884, a library for dynamic binary translation.
==545== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
==545== Using valgrind-3.4.1, a dynamic binary instrumentation framework.
==545== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==545== For more details, rerun with: -v
==545==
==545== Warning: ignored attempt to set SIGKILL handler in sigaction();
==545==          the SIGKILL signal is uncatchable
==545== Syscall param socketcall.sendmsg(msg.msg_control) points to 
uninitialised byte(s)
==545==    at 0x3D0DED3C70: __sendmsg_nocancel (in /lib64/libc-2.5.so)
==545==    by 0x4C83542: netsnmp_udp_send (snmpUDPDomain.c:184)
==545==    by 0x4C5866F: snmp_sess_async_send (snmp_api.c:4862)
==545==    by 0x4C37F4A: snmp_synch_response_cb (snmp_client.c:999)
==545==    by 0x430147: MJSNMPPM::snmpwander(std::string, bool, int*, 
int) (mjsnmppm1.cpp:661)
==545==    by 0x43219F: MJSNMPPM::doCheck(bool) (mjsnmppm1.cpp:202)
==545==    by 0x40DDDB: MJ::checkLoop() (MJ.cpp:944)
==545==    by 0x430DBC: main (mjsnmppm1.cpp:942)
==545==  Address 0x7feffed58 is on thread 1's stack
==545==
==545== ERROR SUMMARY: 34 errors from 1 contexts (suppressed: 4 from 1)
==545== malloc/free: in use at exit: 397,033 bytes in 8,268 blocks.
==545== malloc/free: 22,788 allocs, 14,520 frees, 4,593,240 bytes allocated.
==545== For counts of detected errors, rerun with: -v
==545== Use --track-origins=yes to see where uninitialised values come from
==545== searching for pointers to 8,268 not-freed blocks.
==545== checked 1,795,472 bytes.
==545==
==545== LEAK SUMMARY:
==545==    definitely lost: 0 bytes in 0 blocks.
==545==      possibly lost: 128 bytes in 2 blocks.
==545==    still reachable: 396,905 bytes in 8,266 blocks.
==545==         suppressed: 0 bytes in 0 blocks.
==545== Rerun with --leak-check=full to see details of leaked memory.




__________ Information from ESET Smart Security, version of virus signature 
database 4311 (20090806) __________

The message was checked by ESET Smart Security.

http://www.eset.com



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Net-snmp-coders mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to