Hi Sam, I'm not a maintainer for net-snmp, but it was merged into 5-8-patches yesterday
Regards Anders Wallin On Wed, Apr 10, 2019 at 8:04 PM Sam Tannous <stann...@cumulusnetworks.com> wrote: > Hi Anders, > > We've been testing the patch for > > https://sourceforge.net/p/net-snmp/bugs/2914/ > > and this works. > > Before the patch, we were getting lots of core dumps where the flow in > master.c hits the default case and memory gets double-freed and a core > happens. > We'd love to get this into V5-8-patches soon. > > Thanks, > Sam > > > > On Tue, Apr 9, 2019 at 9:13 AM Sam Tannous <stann...@cumulusnetworks.com> > wrote: > >> Hi Anders, >> >> I fixed some snmpv3 (bulkget) coredumps a while ago. >> https://sourceforge.net/p/net-snmp/patches/1388/ >> >> While not directly related, the (double-free memory) core dumps >> were easily triggered by any error condition within a v3 bulkget. >> >> I'm hoping my patch will get picked up soon :-( >> >> Thanks, >> Sam >> >> On Tue, Apr 9, 2019 at 6:54 AM Anders Wallin <walli...@gmail.com> wrote: >> >>> Now it works fine! >>> >>> thx >>> Anders Wallin >>> >>> >>> On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com> >>> wrote: >>> >>>> Hi Anders, >>>> >>>> Thank you for your feedback! >>>> I attach the v2 patch. Could you try it? >>>> >>>> On the v1 patch, I missed the check for the request callback. So, the >>>> request >>>> gets removed even though the callback doesn't run. >>>> >>>> Thanks, >>>> Masa >>>> >>>> On 4/8/19 11:06 AM, Anders Wallin wrote: >>>> > Hi Masa, >>>> > >>>> > looks like it solves the problem reported by Josef, BUT it breaks >>>> DTLSUDP. >>>> > I run the tests w/o analyzing why. >>>> > To reproduce the issue I did the following using net-snmp master >>>> branch, >>>> > plus these patches >>>> > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when >>>> the >>>> > sending is failed (10 minutes ago) <Masayoshi Mizuma> >>>> > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders >>>> Wallin> >>>> > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 >>>> days >>>> > ago) <Anders Wallin> >>>> > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch >>>> > 'V5-8-patches' (9 weeks ago) <Bart Van Assche> >>>> > >>>> > $ ./configure --prefix=/usr \ >>>> > --with-persistent-directory=/var/lib/net-snmp \ >>>> > --with-mib-modules='smux tlstm-mib tsm-mib >>>> examples/example >>>> > examples/notification' \ >>>> > --with-security-modules="tsm" \ >>>> > --with-transports="TLSTCP DTLSUDP" \ >>>> > --enable-shared \ >>>> > --with-defaults \ >>>> > --enable-ipv6 \ >>>> > --with-cflags="-g -O2" \ >>>> > --without-elf >>>> > >>>> > $ make install >>>> > $ cd testing >>>> > $ ./RUNFULLTESTS -g tls >>>> > DTLS-UDP user certificate tests .......................... 41/? >>>> > This hangs forever in "41" with snmpd.log saying.... >>>> > ...... >>>> > 2019-04-08 16:29:11 >>>> > 2019-04-08 16:29:11 >>>> > Received 0 byte packet from DTLSUDP: unknown >>>> > 2019-04-08 16:29:11 >>>> > 2019-04-08 16:29:13 >>>> > Received 0 byte packet from DTLSUDP: unknown >>>> > 2019-04-08 16:29:13 >>>> > 2019-04-08 16:29:15 >>>> > Received 0 byte packet from DTLSUDP: unknown >>>> > 2019-04-08 16:29:15 >>>> > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170 >>>> > depth=0 err=18:self signed certificate >>>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- >>>> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 1 >>>> > (SSL_ERROR_SSL) >>>> > 2019-04-08 16:29:15 TLS Error: certificate verify failed >>>> > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ---- >>>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- >>>> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5 >>>> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >>>> > 2019-04-08 16:29:15 TLS Error: (null) >>>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- >>>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 >>>> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >>>> > 2019-04-08 16:29:16 TLS Error: (null) >>>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- >>>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 >>>> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >>>> > 2019-04-08 16:29:16 TLS Error: (null) >>>> > >>>> > With the fix suggested på Josef I don't see the DTLSUDP problem, but >>>> maybe >>>> > there are other problems. >>>> > >>>> > Regards >>>> > Anders Wallin >>>> > >>>> > PS. thx for adding commit info to a420d87d3, I updated the patch with >>>> your >>>> > commit comments >>>> > >>>> > >>>> > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma < >>>> msys.miz...@gmail.com> >>>> > wrote: >>>> > >>>> >> Hi Josef, >>>> >> >>>> >> I attach two patches to fix the memory inconsistency if the request >>>> is >>>> >> resend and timed out. >>>> >> Could you try them? >>>> >> >>>> >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch >>>> >> >>>> >> This patch was posted by Anders, and I tried to add the >>>> description. >>>> >> This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback. >>>> >> >>>> >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch >>>> >> >>>> >> This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED >>>> >> and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is >>>> failed, >>>> >> then remove the request from the internal session. >>>> >> >>>> >> Thanks, >>>> >> Masa >>>> >> >>>> >> On 4/3/19 9:34 AM, Anders Wallin wrote: >>>> >>> The introduction of that code fixes another issue; >>>> >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020 >>>> >>> Author: Bill Fenner <fen...@gmail.com> >>>> >>> Date: Wed Dec 20 21:52:10 2017 +0000 >>>> >>> >>>> >>> NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad >>>> SNMPv3 >>>> >>> agent >>>> >>> >>>> >>> With the patch in 1214, the snmp_api code assumed that if magic >>>> was >>>> >>> set, it was the "struct synch-state" from snmp_client. Of >>>> course, >>>> >>> magic belongs to the caller, and the perl library uses it >>>> >> differently, >>>> >>> so reaching into it is verboten. Introduce a new callback (that >>>> >>> was already introduced in 5.8) to report this "retries exceeded" >>>> >>> state, and use it in snmp_client." >>>> >>> >>>> >>> I think the problem is really about shutting down the agentx >>>> connection >>>> >>> when one(1) response is to late. I have >>>> >>> done 2 patches (one that only write a better log message and one >>>> that >>>> >>> removes the "bad" code. >>>> >>> With these patches I don't get any crash. I think that 5.7.3 has >>>> this >>>> >> issue >>>> >>> as well, but it can not be crashed with the agentofdead code >>>> >>> >>>> >>> Can you please try this? >>>> >>> >>>> >>> Regards >>>> >>> Anders Wallin >>>> >>> >>>> >>> >>>> >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> >>>> wrote: >>>> >>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, >>>> that >>>> >>>> following callbacks in snmplib/snmp_api.c causes the core dump >>>> issue: >>>> >>>> >>>> >>>> --- old/snmplib/snmp_api.c 2019-04-03 12:13:55.126769866 +0200 >>>> >>>> +++ new/snmplib/snmp_api.c 2019-04-03 12:15:18.353420790 +0200 >>>> >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list >>>> >>>> sp->s_snmp_errno = SNMPERR_BAD_SENDTO; >>>> >>>> sp->s_errno = errno; >>>> >>>> snmp_set_detail(strerror(errno)); >>>> >>>> - if (rp->callback) >>>> >>>> +/* if (rp->callback) >>>> >>>> rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp, >>>> >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); >>>> >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >>>> >>>> return -1; >>>> >>>> } else { >>>> >>>> netsnmp_get_monotonic_clock(&now); >>>> >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list >>>> >>>> tv.tv_sec += tv.tv_usec / 1000000L; >>>> >>>> tv.tv_usec %= 1000000L; >>>> >>>> rp->expireM = tv; >>>> >>>> - if (rp->callback) >>>> >>>> +/* if (rp->callback) >>>> >>>> rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >>>> >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); >>>> >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >>>> >>>> } >>>> >>>> return 0; >>>> >>>> } >>>> >>>> >>>> >>>> Without them, all works as expected. >>>> >>>> >>>> >>>> Josef Ridky >>>> >>>> Software Engineer >>>> >>>> Core Services Team >>>> >>>> Red Hat Czech, s.r.o. >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> >>>> | From: "Anders Wallin" <walli...@gmail.com> >>>> >>>> | To: "Josef Ridky" <jri...@redhat.com> >>>> >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> >>>> >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM >>>> >>>> | Subject: Re: Core dump with net-snmp-5.8 >>>> >>>> | >>>> >>>> | Hi Josef, >>>> >>>> | I can reproduce the issue using the master branch, I will take a >>>> look >>>> >> at >>>> >>>> it >>>> >>>> | later tonight or tomorrow >>>> >>>> | >>>> >>>> | Regards >>>> >>>> | Anders Wallin >>>> >>>> | >>>> >>>> | >>>> >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> >>>> wrote: >>>> >>>> | >>>> >>>> | > Hi, >>>> >>>> | > >>>> >>>> | > thanks for your patch. Unfortunately, even when I have applied >>>> it, >>>> >> it >>>> >>>> | > still ends with core dump due of 'double free or corruption >>>> >> (fasttop)' >>>> >>>> | > >>>> >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with: >>>> >>>> | > >>>> >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5) >>>> >>>> | > snmp_agent: delegate session == 0x56207e165240 >>>> >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240 >>>> >>>> | > agentx/master: callback resend >>>> >>>> | > agentx/master: callback resend >>>> >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9 >>>> >>>> | > agentx/master: close 0x56207dfd5400, -1 >>>> >>>> | > snmp_agent: removed 40 delegated request(s) for session >>>> >> 0x56207dfce490 >>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240 >>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240 >>>> >>>> | > snmp_agent: REMOVE session == 0x56207e165240 >>>> >>>> | > snmp_agent: agent_session 0x56207e165240 released >>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0 >>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0 >>>> >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0 >>>> >>>> | > snmp_agent: agent_session 0x56207e1041a0 released >>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0 >>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0 >>>> >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0 >>>> >>>> | > snmp_agent: agent_session 0x56207e1656c0 released >>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40 >>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40 >>>> >>>> | > snmp_agent: REMOVE session == 0x56207e11af40 >>>> >>>> | > snmp_agent: agent_session 0x56207e11af40 released >>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00 >>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00 >>>> >>>> | > snmp_agent: REMOVE session == 0x56207e118f00 >>>> >>>> | > snmp_agent: agent_session 0x56207e118f00 released >>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540 >>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540 >>>> >>>> | > snmp_agent: REMOVE session == 0x56207e11b540 >>>> >>>> | > snmp_agent: agent_session 0x56207e11b540 released >>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00 >>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00 >>>> >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00 >>>> >>>> | > snmp_agent: agent_session 0x56207e11bd00 released >>>> >>>> | > agentx/master: Continue removing delegated subsession reqests >>>> >>>> | > agentx/master: close transport >>>> >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400 >>>> >>>> | > agentx/master: response too late on session 0x56207dfd5400 >>>> >>>> | > agentx/master: response too late on session 0x56207dfd5400 >>>> >>>> | > double free or corruption (fasttop) >>>> >>>> | > Aborted (core dumped) >>>> >>>> | > >>>> >>>> | > >>>> >>>> | > What's interesting, when I run it with -DALL it pass (at least >>>> for >>>> >>>> several >>>> >>>> | > rounds). >>>> >>>> | > It looks like some strange race condition. >>>> >>>> | > >>>> >>>> | > Regards >>>> >>>> | > >>>> >>>> | > Josef Ridky >>>> >>>> | > Software Engineer >>>> >>>> | > Core Services Team >>>> >>>> | > Red Hat Czech, s.r.o. >>>> >>>> | > >>>> >>>> | > ----- Original Message ----- >>>> >>>> | > | From: "Anders Wallin" <walli...@gmail.com> >>>> >>>> | > | To: "Josef Ridky" <jri...@redhat.com> >>>> >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net >>>> > >>>> >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM >>>> >>>> | > | Subject: Re: Core dump with net-snmp-5.8 >>>> >>>> | > | >>>> >>>> | > | Hi Josef, >>>> >>>> | > | >>>> >>>> | > | I think it's the same issue as >>>> >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/ >>>> >>>> | > | (where I also posted the solution) >>>> >>>> | > | Regards >>>> >>>> | > | Anders Wallin >>>> >>>> | > | >>>> >>>> | > | >>>> >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky < >>>> jri...@redhat.com> >>>> >>>> wrote: >>>> >>>> | > | >>>> >>>> | > | > Hi, >>>> >>>> | > | > >>>> >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is >>>> >>>> connected to >>>> >>>> | > the >>>> >>>> | > | > bug report [1]. >>>> >>>> | > | > >>>> >>>> | > | > When I tried to run agentofdeath test from [1], snmpd >>>> daemon >>>> >> will >>>> >>>> crash >>>> >>>> | > | > with malloc(): smallbin double linked list corrupted or >>>> double >>>> >>>> free() >>>> >>>> | > issue >>>> >>>> | > | > and dumps core (see bellow). >>>> >>>> | > | > From log file, I can identified one issue with "Unknown >>>> >> operation". >>>> >>>> | > | > >>>> >>>> | > | > This issue is in the agentx_got_response function >>>> >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented >>>> action >>>> >>>> for >>>> >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in >>>> >>>> | > | > include/net-snmp/library/snmp_api.h). >>>> >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is >>>> shown >>>> >> in >>>> >>>> log >>>> >>>> | > | > file. >>>> >>>> | > | > >>>> >>>> | > | > /var/log/messages >>>> >>>> | > | > ------------------------------- >>>> >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation >>>> 6 in >>>> >>>> | > | > agentx_got_response >>>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation >>>> 6 in >>>> >>>> | > | > agentx_got_response >>>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin >>>> >> double >>>> >>>> | > linked >>>> >>>> | > | > list corrupted >>>> >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core >>>> Dump >>>> >>>> (PID >>>> >>>> | > | > 13652/UID 0). >>>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main >>>> >> process >>>> >>>> | > exited, >>>> >>>> | > | > code=dumped, status=6/ABRT >>>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: >>>> Failed with >>>> >>>> result >>>> >>>> | > | > 'core-dump'. >>>> >>>> | > | > ------------------------------- >>>> >>>> | > | > >>>> >>>> | > | > The "Unknown operation" callback is caused by newly added >>>> piece >>>> >> of >>>> >>>> | > code in >>>> >>>> | > | > snmplib/snmp_api.c: >>>> >>>> | > | > >>>> >>>> | > | > static int >>>> >>>> | > | > snmp_resend_request(struct session_list *slp, >>>> >> netsnmp_request_list >>>> >>>> | > *rp, >>>> >>>> | > | > int incr_retries) >>>> >>>> | > | > { >>>> >>>> | > | > >>>> >>>> | > | > ... >>>> >>>> | > | > >>>> >>>> | > | > tv.tv_sec += tv.tv_usec / 1000000L; >>>> >>>> | > | > tv.tv_usec %= 1000000L; >>>> >>>> | > | > rp->expireM = tv; >>>> >>>> | > | > + if (rp->callback) >>>> >>>> | > | > + rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >>>> >>>> | > | > + rp->pdu->reqid, rp->pdu, >>>> rp->cb_data); >>>> >>>> | > | > } >>>> >>>> | > | > return 0; >>>> >>>> | > | > } >>>> >>>> | > | > >>>> >>>> | > | > >>>> >>>> | > | > When I tried to remove it, it just stop complaining about >>>> >>>> operation 6, >>>> >>>> | > but >>>> >>>> | > | > the core dump is still present. >>>> >>>> | > | > >>>> >>>> | > | > May I ask you for help with this issue? Do you have any >>>> idea, >>>> >> what >>>> >>>> | > causing >>>> >>>> | > | > this issue in 5.8 and how to fix it? >>>> >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit >>>> [2], >>>> >>>> but it >>>> >>>> | > | > looks like something other has changed and this issue is >>>> current >>>> >>>> again. >>>> >>>> | > | > >>>> >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/ >>>> >>>> | > | > [2] >>>> >>>> | > | > >>>> >>>> | > >>>> >>>> >>>> >> >>>> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df >>>> >>>> | > | > >>>> >>>> | > | > Regards >>>> >>>> | > | > >>>> >>>> | > | > Josef Ridky >>>> >>>> | > | > Software Engineer >>>> >>>> | > | > Core Services Team >>>> >>>> | > | > Red Hat Czech, s.r.o. >>>> >>>> | > | > >>>> >>>> | > | > >>>> >>>> | > | > >>>> >>>> | > | > _______________________________________________ >>>> >>>> | > | > Net-snmp-coders mailing list >>>> >>>> | > | > Net-snmp-coders@lists.sourceforge.net >>>> >>>> | > | > >>>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >>>> >>>> | > | > >>>> >>>> | > | >>>> >>>> | > >>>> >>>> | >>>> >>>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> _______________________________________________ >>>> >>> Net-snmp-coders mailing list >>>> >>> Net-snmp-coders@lists.sourceforge.net >>>> >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >>>> >>> >>>> >> >>>> > >>>> >>> _______________________________________________ >>> Net-snmp-coders mailing list >>> Net-snmp-coders@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >>> >> >> >> -- >> *Sam Tannous* >> Engineering >> Cumulus Networks® >> +1 650 383 6700 x 1106 >> <http://www.cumulusnetworks,com>www.cumulusnetworks.com >> >> Evaluate Cumulus® Linux® >> https://cumulusnetworks.com/product/secure/evaluate/ >> >> Become a Partner >> http://cumulusnetworks.com/partners/become-a-partner/ >> > > > -- > *Sam Tannous* > Engineering > Cumulus Networks® > +1 650 383 6700 x 1106 > <http://www.cumulusnetworks,com>www.cumulusnetworks.com > > Evaluate Cumulus® Linux® > https://cumulusnetworks.com/product/secure/evaluate/ > > Become a Partner > http://cumulusnetworks.com/partners/become-a-partner/ >
_______________________________________________ Net-snmp-coders mailing list Net-snmp-coders@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/net-snmp-coders