Hi Masa, looks like it solves the problem reported by Josef, BUT it breaks DTLSUDP. I run the tests w/o analyzing why. To reproduce the issue I did the following using net-snmp master branch, plus these patches 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the sending is failed (10 minutes ago) <Masayoshi Mizuma> 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders Wallin> a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 days ago) <Anders Wallin> eaad09d04 - (origin/master, origin/HEAD, master) Merge branch 'V5-8-patches' (9 weeks ago) <Bart Van Assche>
$ ./configure --prefix=/usr \ --with-persistent-directory=/var/lib/net-snmp \ --with-mib-modules='smux tlstm-mib tsm-mib examples/example examples/notification' \ --with-security-modules="tsm" \ --with-transports="TLSTCP DTLSUDP" \ --enable-shared \ --with-defaults \ --enable-ipv6 \ --with-cflags="-g -O2" \ --without-elf $ make install $ cd testing $ ./RUNFULLTESTS -g tls DTLS-UDP user certificate tests .......................... 41/? This hangs forever in "41" with snmpd.log saying.... ...... 2019-04-08 16:29:11 2019-04-08 16:29:11 Received 0 byte packet from DTLSUDP: unknown 2019-04-08 16:29:11 2019-04-08 16:29:13 Received 0 byte packet from DTLSUDP: unknown 2019-04-08 16:29:13 2019-04-08 16:29:15 Received 0 byte packet from DTLSUDP: unknown 2019-04-08 16:29:15 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170 depth=0 err=18:self signed certificate 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 1 (SSL_ERROR_SSL) 2019-04-08 16:29:15 TLS Error: certificate verify failed 2019-04-08 16:29:15 ---- End of OpenSSL Errors ---- 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5 (SSL_ERROR_SYSCALL): system_error=0 (Success) 2019-04-08 16:29:15 TLS Error: (null) 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 (SSL_ERROR_SYSCALL): system_error=0 (Success) 2019-04-08 16:29:16 TLS Error: (null) 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 (SSL_ERROR_SYSCALL): system_error=0 (Success) 2019-04-08 16:29:16 TLS Error: (null) With the fix suggested på Josef I don't see the DTLSUDP problem, but maybe there are other problems. Regards Anders Wallin PS. thx for adding commit info to a420d87d3, I updated the patch with your commit comments On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com> wrote: > Hi Josef, > > I attach two patches to fix the memory inconsistency if the request is > resend and timed out. > Could you try them? > > - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch > > This patch was posted by Anders, and I tried to add the description. > This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback. > > - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch > > This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED > and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed, > then remove the request from the internal session. > > Thanks, > Masa > > On 4/3/19 9:34 AM, Anders Wallin wrote: > > The introduction of that code fixes another issue; > > "commit 56c30b11f3616ea4f0c38a21e08e78f050096020 > > Author: Bill Fenner <fen...@gmail.com> > > Date: Wed Dec 20 21:52:10 2017 +0000 > > > > NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3 > > agent > > > > With the patch in 1214, the snmp_api code assumed that if magic was > > set, it was the "struct synch-state" from snmp_client. Of course, > > magic belongs to the caller, and the perl library uses it > differently, > > so reaching into it is verboten. Introduce a new callback (that > > was already introduced in 5.8) to report this "retries exceeded" > > state, and use it in snmp_client." > > > > I think the problem is really about shutting down the agentx connection > > when one(1) response is to late. I have > > done 2 patches (one that only write a better log message and one that > > removes the "bad" code. > > With these patches I don't get any crash. I think that 5.7.3 has this > issue > > as well, but it can not be crashed with the agentofdead code > > > > Can you please try this? > > > > Regards > > Anders Wallin > > > > > > On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote: > > > >> Hi, > >> > >> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that > >> following callbacks in snmplib/snmp_api.c causes the core dump issue: > >> > >> --- old/snmplib/snmp_api.c 2019-04-03 12:13:55.126769866 +0200 > >> +++ new/snmplib/snmp_api.c 2019-04-03 12:15:18.353420790 +0200 > >> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list > >> sp->s_snmp_errno = SNMPERR_BAD_SENDTO; > >> sp->s_errno = errno; > >> snmp_set_detail(strerror(errno)); > >> - if (rp->callback) > >> +/* if (rp->callback) > >> rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp, > >> - rp->pdu->reqid, rp->pdu, rp->cb_data); > >> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ > >> return -1; > >> } else { > >> netsnmp_get_monotonic_clock(&now); > >> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list > >> tv.tv_sec += tv.tv_usec / 1000000L; > >> tv.tv_usec %= 1000000L; > >> rp->expireM = tv; > >> - if (rp->callback) > >> +/* if (rp->callback) > >> rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, > >> - rp->pdu->reqid, rp->pdu, rp->cb_data); > >> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ > >> } > >> return 0; > >> } > >> > >> Without them, all works as expected. > >> > >> Josef Ridky > >> Software Engineer > >> Core Services Team > >> Red Hat Czech, s.r.o. > >> > >> ----- Original Message ----- > >> | From: "Anders Wallin" <walli...@gmail.com> > >> | To: "Josef Ridky" <jri...@redhat.com> > >> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> > >> | Sent: Tuesday, April 2, 2019 6:27:54 PM > >> | Subject: Re: Core dump with net-snmp-5.8 > >> | > >> | Hi Josef, > >> | I can reproduce the issue using the master branch, I will take a look > at > >> it > >> | later tonight or tomorrow > >> | > >> | Regards > >> | Anders Wallin > >> | > >> | > >> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> wrote: > >> | > >> | > Hi, > >> | > > >> | > thanks for your patch. Unfortunately, even when I have applied it, > it > >> | > still ends with core dump due of 'double free or corruption > (fasttop)' > >> | > > >> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with: > >> | > > >> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5) > >> | > snmp_agent: delegate session == 0x56207e165240 > >> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240 > >> | > agentx/master: callback resend > >> | > agentx/master: callback resend > >> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9 > >> | > agentx/master: close 0x56207dfd5400, -1 > >> | > snmp_agent: removed 40 delegated request(s) for session > 0x56207dfce490 > >> | > snmp_agent: processing delegated request, asp = 0x56207e165240 > >> | > snmp_agent: canceling next walk for asp 0x56207e165240 > >> | > snmp_agent: REMOVE session == 0x56207e165240 > >> | > snmp_agent: agent_session 0x56207e165240 released > >> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0 > >> | > snmp_agent: canceling next walk for asp 0x56207e1041a0 > >> | > snmp_agent: REMOVE session == 0x56207e1041a0 > >> | > snmp_agent: agent_session 0x56207e1041a0 released > >> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0 > >> | > snmp_agent: canceling next walk for asp 0x56207e1656c0 > >> | > snmp_agent: REMOVE session == 0x56207e1656c0 > >> | > snmp_agent: agent_session 0x56207e1656c0 released > >> | > snmp_agent: processing delegated request, asp = 0x56207e11af40 > >> | > snmp_agent: canceling next walk for asp 0x56207e11af40 > >> | > snmp_agent: REMOVE session == 0x56207e11af40 > >> | > snmp_agent: agent_session 0x56207e11af40 released > >> | > snmp_agent: processing delegated request, asp = 0x56207e118f00 > >> | > snmp_agent: canceling next walk for asp 0x56207e118f00 > >> | > snmp_agent: REMOVE session == 0x56207e118f00 > >> | > snmp_agent: agent_session 0x56207e118f00 released > >> | > snmp_agent: processing delegated request, asp = 0x56207e11b540 > >> | > snmp_agent: canceling next walk for asp 0x56207e11b540 > >> | > snmp_agent: REMOVE session == 0x56207e11b540 > >> | > snmp_agent: agent_session 0x56207e11b540 released > >> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00 > >> | > snmp_agent: canceling next walk for asp 0x56207e11bd00 > >> | > snmp_agent: REMOVE session == 0x56207e11bd00 > >> | > snmp_agent: agent_session 0x56207e11bd00 released > >> | > agentx/master: Continue removing delegated subsession reqests > >> | > agentx/master: close transport > >> | > snmp_agent: REMOVE session == 0x56207dfd5400 > >> | > agentx/master: response too late on session 0x56207dfd5400 > >> | > agentx/master: response too late on session 0x56207dfd5400 > >> | > double free or corruption (fasttop) > >> | > Aborted (core dumped) > >> | > > >> | > > >> | > What's interesting, when I run it with -DALL it pass (at least for > >> several > >> | > rounds). > >> | > It looks like some strange race condition. > >> | > > >> | > Regards > >> | > > >> | > Josef Ridky > >> | > Software Engineer > >> | > Core Services Team > >> | > Red Hat Czech, s.r.o. > >> | > > >> | > ----- Original Message ----- > >> | > | From: "Anders Wallin" <walli...@gmail.com> > >> | > | To: "Josef Ridky" <jri...@redhat.com> > >> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> > >> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM > >> | > | Subject: Re: Core dump with net-snmp-5.8 > >> | > | > >> | > | Hi Josef, > >> | > | > >> | > | I think it's the same issue as > >> | > https://sourceforge.net/p/net-snmp/bugs/2914/ > >> | > | (where I also posted the solution) > >> | > | Regards > >> | > | Anders Wallin > >> | > | > >> | > | > >> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com> > >> wrote: > >> | > | > >> | > | > Hi, > >> | > | > > >> | > | > recently, I have hit to an issue in net-snmp-5.8, that is > >> connected to > >> | > the > >> | > | > bug report [1]. > >> | > | > > >> | > | > When I tried to run agentofdeath test from [1], snmpd daemon > will > >> crash > >> | > | > with malloc(): smallbin double linked list corrupted or double > >> free() > >> | > issue > >> | > | > and dumps core (see bellow). > >> | > | > From log file, I can identified one issue with "Unknown > operation". > >> | > | > > >> | > | > This issue is in the agentx_got_response function > >> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented action > >> for > >> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in > >> | > | > include/net-snmp/library/snmp_api.h). > >> | > | > As result "Unknown operation 6 in agentx_got_response" is shown > in > >> log > >> | > | > file. > >> | > | > > >> | > | > /var/log/messages > >> | > | > ------------------------------- > >> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in > >> | > | > agentx_got_response > >> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in > >> | > | > agentx_got_response > >> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin > double > >> | > linked > >> | > | > list corrupted > >> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core Dump > >> (PID > >> | > | > 13652/UID 0). > >> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main > process > >> | > exited, > >> | > | > code=dumped, status=6/ABRT > >> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed with > >> result > >> | > | > 'core-dump'. > >> | > | > ------------------------------- > >> | > | > > >> | > | > The "Unknown operation" callback is caused by newly added piece > of > >> | > code in > >> | > | > snmplib/snmp_api.c: > >> | > | > > >> | > | > static int > >> | > | > snmp_resend_request(struct session_list *slp, > netsnmp_request_list > >> | > *rp, > >> | > | > int incr_retries) > >> | > | > { > >> | > | > > >> | > | > ... > >> | > | > > >> | > | > tv.tv_sec += tv.tv_usec / 1000000L; > >> | > | > tv.tv_usec %= 1000000L; > >> | > | > rp->expireM = tv; > >> | > | > + if (rp->callback) > >> | > | > + rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, > >> | > | > + rp->pdu->reqid, rp->pdu, rp->cb_data); > >> | > | > } > >> | > | > return 0; > >> | > | > } > >> | > | > > >> | > | > > >> | > | > When I tried to remove it, it just stop complaining about > >> operation 6, > >> | > but > >> | > | > the core dump is still present. > >> | > | > > >> | > | > May I ask you for help with this issue? Do you have any idea, > what > >> | > causing > >> | > | > this issue in 5.8 and how to fix it? > >> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit [2], > >> but it > >> | > | > looks like something other has changed and this issue is current > >> again. > >> | > | > > >> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/ > >> | > | > [2] > >> | > | > > >> | > > >> > https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df > >> | > | > > >> | > | > Regards > >> | > | > > >> | > | > Josef Ridky > >> | > | > Software Engineer > >> | > | > Core Services Team > >> | > | > Red Hat Czech, s.r.o. > >> | > | > > >> | > | > > >> | > | > > >> | > | > _______________________________________________ > >> | > | > Net-snmp-coders mailing list > >> | > | > Net-snmp-coders@lists.sourceforge.net > >> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders > >> | > | > > >> | > | > >> | > > >> | > >> > > > > > > > > _______________________________________________ > > Net-snmp-coders mailing list > > Net-snmp-coders@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders > > >
_______________________________________________ Net-snmp-coders mailing list Net-snmp-coders@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/net-snmp-coders