Hi Anders, I fixed some snmpv3 (bulkget) coredumps a while ago. https://sourceforge.net/p/net-snmp/patches/1388/
While not directly related, the (double-free memory) core dumps were easily triggered by any error condition within a v3 bulkget. I'm hoping my patch will get picked up soon :-( Thanks, Sam On Tue, Apr 9, 2019 at 6:54 AM Anders Wallin <walli...@gmail.com> wrote: > Now it works fine! > > thx > Anders Wallin > > > On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com> > wrote: > >> Hi Anders, >> >> Thank you for your feedback! >> I attach the v2 patch. Could you try it? >> >> On the v1 patch, I missed the check for the request callback. So, the >> request >> gets removed even though the callback doesn't run. >> >> Thanks, >> Masa >> >> On 4/8/19 11:06 AM, Anders Wallin wrote: >> > Hi Masa, >> > >> > looks like it solves the problem reported by Josef, BUT it breaks >> DTLSUDP. >> > I run the tests w/o analyzing why. >> > To reproduce the issue I did the following using net-snmp master branch, >> > plus these patches >> > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the >> > sending is failed (10 minutes ago) <Masayoshi Mizuma> >> > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders >> Wallin> >> > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 >> days >> > ago) <Anders Wallin> >> > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch >> > 'V5-8-patches' (9 weeks ago) <Bart Van Assche> >> > >> > $ ./configure --prefix=/usr \ >> > --with-persistent-directory=/var/lib/net-snmp \ >> > --with-mib-modules='smux tlstm-mib tsm-mib >> examples/example >> > examples/notification' \ >> > --with-security-modules="tsm" \ >> > --with-transports="TLSTCP DTLSUDP" \ >> > --enable-shared \ >> > --with-defaults \ >> > --enable-ipv6 \ >> > --with-cflags="-g -O2" \ >> > --without-elf >> > >> > $ make install >> > $ cd testing >> > $ ./RUNFULLTESTS -g tls >> > DTLS-UDP user certificate tests .......................... 41/? >> > This hangs forever in "41" with snmpd.log saying.... >> > ...... >> > 2019-04-08 16:29:11 >> > 2019-04-08 16:29:11 >> > Received 0 byte packet from DTLSUDP: unknown >> > 2019-04-08 16:29:11 >> > 2019-04-08 16:29:13 >> > Received 0 byte packet from DTLSUDP: unknown >> > 2019-04-08 16:29:13 >> > 2019-04-08 16:29:15 >> > Received 0 byte packet from DTLSUDP: unknown >> > 2019-04-08 16:29:15 >> > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170 >> > depth=0 err=18:self signed certificate >> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- >> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 1 >> > (SSL_ERROR_SSL) >> > 2019-04-08 16:29:15 TLS Error: certificate verify failed >> > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ---- >> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- >> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5 >> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >> > 2019-04-08 16:29:15 TLS Error: (null) >> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- >> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 >> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >> > 2019-04-08 16:29:16 TLS Error: (null) >> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- >> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 >> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >> > 2019-04-08 16:29:16 TLS Error: (null) >> > >> > With the fix suggested på Josef I don't see the DTLSUDP problem, but >> maybe >> > there are other problems. >> > >> > Regards >> > Anders Wallin >> > >> > PS. thx for adding commit info to a420d87d3, I updated the patch with >> your >> > commit comments >> > >> > >> > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com> >> > wrote: >> > >> >> Hi Josef, >> >> >> >> I attach two patches to fix the memory inconsistency if the request is >> >> resend and timed out. >> >> Could you try them? >> >> >> >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch >> >> >> >> This patch was posted by Anders, and I tried to add the description. >> >> This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback. >> >> >> >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch >> >> >> >> This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED >> >> and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed, >> >> then remove the request from the internal session. >> >> >> >> Thanks, >> >> Masa >> >> >> >> On 4/3/19 9:34 AM, Anders Wallin wrote: >> >>> The introduction of that code fixes another issue; >> >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020 >> >>> Author: Bill Fenner <fen...@gmail.com> >> >>> Date: Wed Dec 20 21:52:10 2017 +0000 >> >>> >> >>> NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad >> SNMPv3 >> >>> agent >> >>> >> >>> With the patch in 1214, the snmp_api code assumed that if magic >> was >> >>> set, it was the "struct synch-state" from snmp_client. Of course, >> >>> magic belongs to the caller, and the perl library uses it >> >> differently, >> >>> so reaching into it is verboten. Introduce a new callback (that >> >>> was already introduced in 5.8) to report this "retries exceeded" >> >>> state, and use it in snmp_client." >> >>> >> >>> I think the problem is really about shutting down the agentx >> connection >> >>> when one(1) response is to late. I have >> >>> done 2 patches (one that only write a better log message and one that >> >>> removes the "bad" code. >> >>> With these patches I don't get any crash. I think that 5.7.3 has this >> >> issue >> >>> as well, but it can not be crashed with the agentofdead code >> >>> >> >>> Can you please try this? >> >>> >> >>> Regards >> >>> Anders Wallin >> >>> >> >>> >> >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> >> wrote: >> >>> >> >>>> Hi, >> >>>> >> >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, >> that >> >>>> following callbacks in snmplib/snmp_api.c causes the core dump issue: >> >>>> >> >>>> --- old/snmplib/snmp_api.c 2019-04-03 12:13:55.126769866 +0200 >> >>>> +++ new/snmplib/snmp_api.c 2019-04-03 12:15:18.353420790 +0200 >> >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list >> >>>> sp->s_snmp_errno = SNMPERR_BAD_SENDTO; >> >>>> sp->s_errno = errno; >> >>>> snmp_set_detail(strerror(errno)); >> >>>> - if (rp->callback) >> >>>> +/* if (rp->callback) >> >>>> rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp, >> >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); >> >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >> >>>> return -1; >> >>>> } else { >> >>>> netsnmp_get_monotonic_clock(&now); >> >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list >> >>>> tv.tv_sec += tv.tv_usec / 1000000L; >> >>>> tv.tv_usec %= 1000000L; >> >>>> rp->expireM = tv; >> >>>> - if (rp->callback) >> >>>> +/* if (rp->callback) >> >>>> rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >> >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); >> >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >> >>>> } >> >>>> return 0; >> >>>> } >> >>>> >> >>>> Without them, all works as expected. >> >>>> >> >>>> Josef Ridky >> >>>> Software Engineer >> >>>> Core Services Team >> >>>> Red Hat Czech, s.r.o. >> >>>> >> >>>> ----- Original Message ----- >> >>>> | From: "Anders Wallin" <walli...@gmail.com> >> >>>> | To: "Josef Ridky" <jri...@redhat.com> >> >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> >> >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM >> >>>> | Subject: Re: Core dump with net-snmp-5.8 >> >>>> | >> >>>> | Hi Josef, >> >>>> | I can reproduce the issue using the master branch, I will take a >> look >> >> at >> >>>> it >> >>>> | later tonight or tomorrow >> >>>> | >> >>>> | Regards >> >>>> | Anders Wallin >> >>>> | >> >>>> | >> >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> >> wrote: >> >>>> | >> >>>> | > Hi, >> >>>> | > >> >>>> | > thanks for your patch. Unfortunately, even when I have applied >> it, >> >> it >> >>>> | > still ends with core dump due of 'double free or corruption >> >> (fasttop)' >> >>>> | > >> >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with: >> >>>> | > >> >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5) >> >>>> | > snmp_agent: delegate session == 0x56207e165240 >> >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240 >> >>>> | > agentx/master: callback resend >> >>>> | > agentx/master: callback resend >> >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9 >> >>>> | > agentx/master: close 0x56207dfd5400, -1 >> >>>> | > snmp_agent: removed 40 delegated request(s) for session >> >> 0x56207dfce490 >> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240 >> >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240 >> >>>> | > snmp_agent: REMOVE session == 0x56207e165240 >> >>>> | > snmp_agent: agent_session 0x56207e165240 released >> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0 >> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0 >> >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0 >> >>>> | > snmp_agent: agent_session 0x56207e1041a0 released >> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0 >> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0 >> >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0 >> >>>> | > snmp_agent: agent_session 0x56207e1656c0 released >> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40 >> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40 >> >>>> | > snmp_agent: REMOVE session == 0x56207e11af40 >> >>>> | > snmp_agent: agent_session 0x56207e11af40 released >> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00 >> >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00 >> >>>> | > snmp_agent: REMOVE session == 0x56207e118f00 >> >>>> | > snmp_agent: agent_session 0x56207e118f00 released >> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540 >> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540 >> >>>> | > snmp_agent: REMOVE session == 0x56207e11b540 >> >>>> | > snmp_agent: agent_session 0x56207e11b540 released >> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00 >> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00 >> >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00 >> >>>> | > snmp_agent: agent_session 0x56207e11bd00 released >> >>>> | > agentx/master: Continue removing delegated subsession reqests >> >>>> | > agentx/master: close transport >> >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400 >> >>>> | > agentx/master: response too late on session 0x56207dfd5400 >> >>>> | > agentx/master: response too late on session 0x56207dfd5400 >> >>>> | > double free or corruption (fasttop) >> >>>> | > Aborted (core dumped) >> >>>> | > >> >>>> | > >> >>>> | > What's interesting, when I run it with -DALL it pass (at least >> for >> >>>> several >> >>>> | > rounds). >> >>>> | > It looks like some strange race condition. >> >>>> | > >> >>>> | > Regards >> >>>> | > >> >>>> | > Josef Ridky >> >>>> | > Software Engineer >> >>>> | > Core Services Team >> >>>> | > Red Hat Czech, s.r.o. >> >>>> | > >> >>>> | > ----- Original Message ----- >> >>>> | > | From: "Anders Wallin" <walli...@gmail.com> >> >>>> | > | To: "Josef Ridky" <jri...@redhat.com> >> >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> >> >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM >> >>>> | > | Subject: Re: Core dump with net-snmp-5.8 >> >>>> | > | >> >>>> | > | Hi Josef, >> >>>> | > | >> >>>> | > | I think it's the same issue as >> >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/ >> >>>> | > | (where I also posted the solution) >> >>>> | > | Regards >> >>>> | > | Anders Wallin >> >>>> | > | >> >>>> | > | >> >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com >> > >> >>>> wrote: >> >>>> | > | >> >>>> | > | > Hi, >> >>>> | > | > >> >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is >> >>>> connected to >> >>>> | > the >> >>>> | > | > bug report [1]. >> >>>> | > | > >> >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon >> >> will >> >>>> crash >> >>>> | > | > with malloc(): smallbin double linked list corrupted or >> double >> >>>> free() >> >>>> | > issue >> >>>> | > | > and dumps core (see bellow). >> >>>> | > | > From log file, I can identified one issue with "Unknown >> >> operation". >> >>>> | > | > >> >>>> | > | > This issue is in the agentx_got_response function >> >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented >> action >> >>>> for >> >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in >> >>>> | > | > include/net-snmp/library/snmp_api.h). >> >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is >> shown >> >> in >> >>>> log >> >>>> | > | > file. >> >>>> | > | > >> >>>> | > | > /var/log/messages >> >>>> | > | > ------------------------------- >> >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 >> in >> >>>> | > | > agentx_got_response >> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 >> in >> >>>> | > | > agentx_got_response >> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin >> >> double >> >>>> | > linked >> >>>> | > | > list corrupted >> >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core >> Dump >> >>>> (PID >> >>>> | > | > 13652/UID 0). >> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main >> >> process >> >>>> | > exited, >> >>>> | > | > code=dumped, status=6/ABRT >> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed >> with >> >>>> result >> >>>> | > | > 'core-dump'. >> >>>> | > | > ------------------------------- >> >>>> | > | > >> >>>> | > | > The "Unknown operation" callback is caused by newly added >> piece >> >> of >> >>>> | > code in >> >>>> | > | > snmplib/snmp_api.c: >> >>>> | > | > >> >>>> | > | > static int >> >>>> | > | > snmp_resend_request(struct session_list *slp, >> >> netsnmp_request_list >> >>>> | > *rp, >> >>>> | > | > int incr_retries) >> >>>> | > | > { >> >>>> | > | > >> >>>> | > | > ... >> >>>> | > | > >> >>>> | > | > tv.tv_sec += tv.tv_usec / 1000000L; >> >>>> | > | > tv.tv_usec %= 1000000L; >> >>>> | > | > rp->expireM = tv; >> >>>> | > | > + if (rp->callback) >> >>>> | > | > + rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >> >>>> | > | > + rp->pdu->reqid, rp->pdu, >> rp->cb_data); >> >>>> | > | > } >> >>>> | > | > return 0; >> >>>> | > | > } >> >>>> | > | > >> >>>> | > | > >> >>>> | > | > When I tried to remove it, it just stop complaining about >> >>>> operation 6, >> >>>> | > but >> >>>> | > | > the core dump is still present. >> >>>> | > | > >> >>>> | > | > May I ask you for help with this issue? Do you have any idea, >> >> what >> >>>> | > causing >> >>>> | > | > this issue in 5.8 and how to fix it? >> >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit >> [2], >> >>>> but it >> >>>> | > | > looks like something other has changed and this issue is >> current >> >>>> again. >> >>>> | > | > >> >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/ >> >>>> | > | > [2] >> >>>> | > | > >> >>>> | > >> >>>> >> >> >> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df >> >>>> | > | > >> >>>> | > | > Regards >> >>>> | > | > >> >>>> | > | > Josef Ridky >> >>>> | > | > Software Engineer >> >>>> | > | > Core Services Team >> >>>> | > | > Red Hat Czech, s.r.o. >> >>>> | > | > >> >>>> | > | > >> >>>> | > | > >> >>>> | > | > _______________________________________________ >> >>>> | > | > Net-snmp-coders mailing list >> >>>> | > | > Net-snmp-coders@lists.sourceforge.net >> >>>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >> >>>> | > | > >> >>>> | > | >> >>>> | > >> >>>> | >> >>>> >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Net-snmp-coders mailing list >> >>> Net-snmp-coders@lists.sourceforge.net >> >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >> >>> >> >> >> > >> > _______________________________________________ > Net-snmp-coders mailing list > Net-snmp-coders@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders > -- *Sam Tannous* Engineering Cumulus Networks® +1 650 383 6700 x 1106 <http://www.cumulusnetworks,com>www.cumulusnetworks.com Evaluate Cumulus® Linux® https://cumulusnetworks.com/product/secure/evaluate/ Become a Partner http://cumulusnetworks.com/partners/become-a-partner/
_______________________________________________ Net-snmp-coders mailing list Net-snmp-coders@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/net-snmp-coders