Hi Anders, We've been testing the patch for
https://sourceforge.net/p/net-snmp/bugs/2914/ and this works. Before the patch, we were getting lots of core dumps where the flow in master.c hits the default case and memory gets double-freed and a core happens. We'd love to get this into V5-8-patches soon. Thanks, Sam On Tue, Apr 9, 2019 at 9:13 AM Sam Tannous <stann...@cumulusnetworks.com> wrote: > Hi Anders, > > I fixed some snmpv3 (bulkget) coredumps a while ago. > https://sourceforge.net/p/net-snmp/patches/1388/ > > While not directly related, the (double-free memory) core dumps > were easily triggered by any error condition within a v3 bulkget. > > I'm hoping my patch will get picked up soon :-( > > Thanks, > Sam > > On Tue, Apr 9, 2019 at 6:54 AM Anders Wallin <walli...@gmail.com> wrote: > >> Now it works fine! >> >> thx >> Anders Wallin >> >> >> On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com> >> wrote: >> >>> Hi Anders, >>> >>> Thank you for your feedback! >>> I attach the v2 patch. Could you try it? >>> >>> On the v1 patch, I missed the check for the request callback. So, the >>> request >>> gets removed even though the callback doesn't run. >>> >>> Thanks, >>> Masa >>> >>> On 4/8/19 11:06 AM, Anders Wallin wrote: >>> > Hi Masa, >>> > >>> > looks like it solves the problem reported by Josef, BUT it breaks >>> DTLSUDP. >>> > I run the tests w/o analyzing why. >>> > To reproduce the issue I did the following using net-snmp master >>> branch, >>> > plus these patches >>> > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when >>> the >>> > sending is failed (10 minutes ago) <Masayoshi Mizuma> >>> > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders >>> Wallin> >>> > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 >>> days >>> > ago) <Anders Wallin> >>> > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch >>> > 'V5-8-patches' (9 weeks ago) <Bart Van Assche> >>> > >>> > $ ./configure --prefix=/usr \ >>> > --with-persistent-directory=/var/lib/net-snmp \ >>> > --with-mib-modules='smux tlstm-mib tsm-mib >>> examples/example >>> > examples/notification' \ >>> > --with-security-modules="tsm" \ >>> > --with-transports="TLSTCP DTLSUDP" \ >>> > --enable-shared \ >>> > --with-defaults \ >>> > --enable-ipv6 \ >>> > --with-cflags="-g -O2" \ >>> > --without-elf >>> > >>> > $ make install >>> > $ cd testing >>> > $ ./RUNFULLTESTS -g tls >>> > DTLS-UDP user certificate tests .......................... 41/? >>> > This hangs forever in "41" with snmpd.log saying.... >>> > ...... >>> > 2019-04-08 16:29:11 >>> > 2019-04-08 16:29:11 >>> > Received 0 byte packet from DTLSUDP: unknown >>> > 2019-04-08 16:29:11 >>> > 2019-04-08 16:29:13 >>> > Received 0 byte packet from DTLSUDP: unknown >>> > 2019-04-08 16:29:13 >>> > 2019-04-08 16:29:15 >>> > Received 0 byte packet from DTLSUDP: unknown >>> > 2019-04-08 16:29:15 >>> > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170 >>> > depth=0 err=18:self signed certificate >>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- >>> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 1 >>> > (SSL_ERROR_SSL) >>> > 2019-04-08 16:29:15 TLS Error: certificate verify failed >>> > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ---- >>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- >>> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5 >>> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >>> > 2019-04-08 16:29:15 TLS Error: (null) >>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- >>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 >>> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >>> > 2019-04-08 16:29:16 TLS Error: (null) >>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- >>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 >>> > (SSL_ERROR_SYSCALL): system_error=0 (Success) >>> > 2019-04-08 16:29:16 TLS Error: (null) >>> > >>> > With the fix suggested på Josef I don't see the DTLSUDP problem, but >>> maybe >>> > there are other problems. >>> > >>> > Regards >>> > Anders Wallin >>> > >>> > PS. thx for adding commit info to a420d87d3, I updated the patch with >>> your >>> > commit comments >>> > >>> > >>> > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com >>> > >>> > wrote: >>> > >>> >> Hi Josef, >>> >> >>> >> I attach two patches to fix the memory inconsistency if the request is >>> >> resend and timed out. >>> >> Could you try them? >>> >> >>> >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch >>> >> >>> >> This patch was posted by Anders, and I tried to add the description. >>> >> This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback. >>> >> >>> >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch >>> >> >>> >> This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED >>> >> and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is >>> failed, >>> >> then remove the request from the internal session. >>> >> >>> >> Thanks, >>> >> Masa >>> >> >>> >> On 4/3/19 9:34 AM, Anders Wallin wrote: >>> >>> The introduction of that code fixes another issue; >>> >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020 >>> >>> Author: Bill Fenner <fen...@gmail.com> >>> >>> Date: Wed Dec 20 21:52:10 2017 +0000 >>> >>> >>> >>> NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad >>> SNMPv3 >>> >>> agent >>> >>> >>> >>> With the patch in 1214, the snmp_api code assumed that if magic >>> was >>> >>> set, it was the "struct synch-state" from snmp_client. Of >>> course, >>> >>> magic belongs to the caller, and the perl library uses it >>> >> differently, >>> >>> so reaching into it is verboten. Introduce a new callback (that >>> >>> was already introduced in 5.8) to report this "retries exceeded" >>> >>> state, and use it in snmp_client." >>> >>> >>> >>> I think the problem is really about shutting down the agentx >>> connection >>> >>> when one(1) response is to late. I have >>> >>> done 2 patches (one that only write a better log message and one that >>> >>> removes the "bad" code. >>> >>> With these patches I don't get any crash. I think that 5.7.3 has this >>> >> issue >>> >>> as well, but it can not be crashed with the agentofdead code >>> >>> >>> >>> Can you please try this? >>> >>> >>> >>> Regards >>> >>> Anders Wallin >>> >>> >>> >>> >>> >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> >>> wrote: >>> >>> >>> >>>> Hi, >>> >>>> >>> >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, >>> that >>> >>>> following callbacks in snmplib/snmp_api.c causes the core dump >>> issue: >>> >>>> >>> >>>> --- old/snmplib/snmp_api.c 2019-04-03 12:13:55.126769866 +0200 >>> >>>> +++ new/snmplib/snmp_api.c 2019-04-03 12:15:18.353420790 +0200 >>> >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list >>> >>>> sp->s_snmp_errno = SNMPERR_BAD_SENDTO; >>> >>>> sp->s_errno = errno; >>> >>>> snmp_set_detail(strerror(errno)); >>> >>>> - if (rp->callback) >>> >>>> +/* if (rp->callback) >>> >>>> rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp, >>> >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); >>> >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >>> >>>> return -1; >>> >>>> } else { >>> >>>> netsnmp_get_monotonic_clock(&now); >>> >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list >>> >>>> tv.tv_sec += tv.tv_usec / 1000000L; >>> >>>> tv.tv_usec %= 1000000L; >>> >>>> rp->expireM = tv; >>> >>>> - if (rp->callback) >>> >>>> +/* if (rp->callback) >>> >>>> rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >>> >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); >>> >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ >>> >>>> } >>> >>>> return 0; >>> >>>> } >>> >>>> >>> >>>> Without them, all works as expected. >>> >>>> >>> >>>> Josef Ridky >>> >>>> Software Engineer >>> >>>> Core Services Team >>> >>>> Red Hat Czech, s.r.o. >>> >>>> >>> >>>> ----- Original Message ----- >>> >>>> | From: "Anders Wallin" <walli...@gmail.com> >>> >>>> | To: "Josef Ridky" <jri...@redhat.com> >>> >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> >>> >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM >>> >>>> | Subject: Re: Core dump with net-snmp-5.8 >>> >>>> | >>> >>>> | Hi Josef, >>> >>>> | I can reproduce the issue using the master branch, I will take a >>> look >>> >> at >>> >>>> it >>> >>>> | later tonight or tomorrow >>> >>>> | >>> >>>> | Regards >>> >>>> | Anders Wallin >>> >>>> | >>> >>>> | >>> >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> >>> wrote: >>> >>>> | >>> >>>> | > Hi, >>> >>>> | > >>> >>>> | > thanks for your patch. Unfortunately, even when I have applied >>> it, >>> >> it >>> >>>> | > still ends with core dump due of 'double free or corruption >>> >> (fasttop)' >>> >>>> | > >>> >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with: >>> >>>> | > >>> >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5) >>> >>>> | > snmp_agent: delegate session == 0x56207e165240 >>> >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240 >>> >>>> | > agentx/master: callback resend >>> >>>> | > agentx/master: callback resend >>> >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9 >>> >>>> | > agentx/master: close 0x56207dfd5400, -1 >>> >>>> | > snmp_agent: removed 40 delegated request(s) for session >>> >> 0x56207dfce490 >>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240 >>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240 >>> >>>> | > snmp_agent: REMOVE session == 0x56207e165240 >>> >>>> | > snmp_agent: agent_session 0x56207e165240 released >>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0 >>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0 >>> >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0 >>> >>>> | > snmp_agent: agent_session 0x56207e1041a0 released >>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0 >>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0 >>> >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0 >>> >>>> | > snmp_agent: agent_session 0x56207e1656c0 released >>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40 >>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40 >>> >>>> | > snmp_agent: REMOVE session == 0x56207e11af40 >>> >>>> | > snmp_agent: agent_session 0x56207e11af40 released >>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00 >>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00 >>> >>>> | > snmp_agent: REMOVE session == 0x56207e118f00 >>> >>>> | > snmp_agent: agent_session 0x56207e118f00 released >>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540 >>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540 >>> >>>> | > snmp_agent: REMOVE session == 0x56207e11b540 >>> >>>> | > snmp_agent: agent_session 0x56207e11b540 released >>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00 >>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00 >>> >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00 >>> >>>> | > snmp_agent: agent_session 0x56207e11bd00 released >>> >>>> | > agentx/master: Continue removing delegated subsession reqests >>> >>>> | > agentx/master: close transport >>> >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400 >>> >>>> | > agentx/master: response too late on session 0x56207dfd5400 >>> >>>> | > agentx/master: response too late on session 0x56207dfd5400 >>> >>>> | > double free or corruption (fasttop) >>> >>>> | > Aborted (core dumped) >>> >>>> | > >>> >>>> | > >>> >>>> | > What's interesting, when I run it with -DALL it pass (at least >>> for >>> >>>> several >>> >>>> | > rounds). >>> >>>> | > It looks like some strange race condition. >>> >>>> | > >>> >>>> | > Regards >>> >>>> | > >>> >>>> | > Josef Ridky >>> >>>> | > Software Engineer >>> >>>> | > Core Services Team >>> >>>> | > Red Hat Czech, s.r.o. >>> >>>> | > >>> >>>> | > ----- Original Message ----- >>> >>>> | > | From: "Anders Wallin" <walli...@gmail.com> >>> >>>> | > | To: "Josef Ridky" <jri...@redhat.com> >>> >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> >>> >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM >>> >>>> | > | Subject: Re: Core dump with net-snmp-5.8 >>> >>>> | > | >>> >>>> | > | Hi Josef, >>> >>>> | > | >>> >>>> | > | I think it's the same issue as >>> >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/ >>> >>>> | > | (where I also posted the solution) >>> >>>> | > | Regards >>> >>>> | > | Anders Wallin >>> >>>> | > | >>> >>>> | > | >>> >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky < >>> jri...@redhat.com> >>> >>>> wrote: >>> >>>> | > | >>> >>>> | > | > Hi, >>> >>>> | > | > >>> >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is >>> >>>> connected to >>> >>>> | > the >>> >>>> | > | > bug report [1]. >>> >>>> | > | > >>> >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon >>> >> will >>> >>>> crash >>> >>>> | > | > with malloc(): smallbin double linked list corrupted or >>> double >>> >>>> free() >>> >>>> | > issue >>> >>>> | > | > and dumps core (see bellow). >>> >>>> | > | > From log file, I can identified one issue with "Unknown >>> >> operation". >>> >>>> | > | > >>> >>>> | > | > This issue is in the agentx_got_response function >>> >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented >>> action >>> >>>> for >>> >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in >>> >>>> | > | > include/net-snmp/library/snmp_api.h). >>> >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is >>> shown >>> >> in >>> >>>> log >>> >>>> | > | > file. >>> >>>> | > | > >>> >>>> | > | > /var/log/messages >>> >>>> | > | > ------------------------------- >>> >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 >>> in >>> >>>> | > | > agentx_got_response >>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 >>> in >>> >>>> | > | > agentx_got_response >>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin >>> >> double >>> >>>> | > linked >>> >>>> | > | > list corrupted >>> >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core >>> Dump >>> >>>> (PID >>> >>>> | > | > 13652/UID 0). >>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main >>> >> process >>> >>>> | > exited, >>> >>>> | > | > code=dumped, status=6/ABRT >>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed >>> with >>> >>>> result >>> >>>> | > | > 'core-dump'. >>> >>>> | > | > ------------------------------- >>> >>>> | > | > >>> >>>> | > | > The "Unknown operation" callback is caused by newly added >>> piece >>> >> of >>> >>>> | > code in >>> >>>> | > | > snmplib/snmp_api.c: >>> >>>> | > | > >>> >>>> | > | > static int >>> >>>> | > | > snmp_resend_request(struct session_list *slp, >>> >> netsnmp_request_list >>> >>>> | > *rp, >>> >>>> | > | > int incr_retries) >>> >>>> | > | > { >>> >>>> | > | > >>> >>>> | > | > ... >>> >>>> | > | > >>> >>>> | > | > tv.tv_sec += tv.tv_usec / 1000000L; >>> >>>> | > | > tv.tv_usec %= 1000000L; >>> >>>> | > | > rp->expireM = tv; >>> >>>> | > | > + if (rp->callback) >>> >>>> | > | > + rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, >>> >>>> | > | > + rp->pdu->reqid, rp->pdu, >>> rp->cb_data); >>> >>>> | > | > } >>> >>>> | > | > return 0; >>> >>>> | > | > } >>> >>>> | > | > >>> >>>> | > | > >>> >>>> | > | > When I tried to remove it, it just stop complaining about >>> >>>> operation 6, >>> >>>> | > but >>> >>>> | > | > the core dump is still present. >>> >>>> | > | > >>> >>>> | > | > May I ask you for help with this issue? Do you have any >>> idea, >>> >> what >>> >>>> | > causing >>> >>>> | > | > this issue in 5.8 and how to fix it? >>> >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit >>> [2], >>> >>>> but it >>> >>>> | > | > looks like something other has changed and this issue is >>> current >>> >>>> again. >>> >>>> | > | > >>> >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/ >>> >>>> | > | > [2] >>> >>>> | > | > >>> >>>> | > >>> >>>> >>> >> >>> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df >>> >>>> | > | > >>> >>>> | > | > Regards >>> >>>> | > | > >>> >>>> | > | > Josef Ridky >>> >>>> | > | > Software Engineer >>> >>>> | > | > Core Services Team >>> >>>> | > | > Red Hat Czech, s.r.o. >>> >>>> | > | > >>> >>>> | > | > >>> >>>> | > | > >>> >>>> | > | > _______________________________________________ >>> >>>> | > | > Net-snmp-coders mailing list >>> >>>> | > | > Net-snmp-coders@lists.sourceforge.net >>> >>>> | > | > >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >>> >>>> | > | > >>> >>>> | > | >>> >>>> | > >>> >>>> | >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> Net-snmp-coders mailing list >>> >>> Net-snmp-coders@lists.sourceforge.net >>> >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >>> >>> >>> >> >>> > >>> >> _______________________________________________ >> Net-snmp-coders mailing list >> Net-snmp-coders@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders >> > > > -- > *Sam Tannous* > Engineering > Cumulus Networks® > +1 650 383 6700 x 1106 > <http://www.cumulusnetworks,com>www.cumulusnetworks.com > > Evaluate Cumulus® Linux® > https://cumulusnetworks.com/product/secure/evaluate/ > > Become a Partner > http://cumulusnetworks.com/partners/become-a-partner/ > -- *Sam Tannous* Engineering Cumulus Networks® +1 650 383 6700 x 1106 <http://www.cumulusnetworks,com>www.cumulusnetworks.com Evaluate Cumulus® Linux® https://cumulusnetworks.com/product/secure/evaluate/ Become a Partner http://cumulusnetworks.com/partners/become-a-partner/
_______________________________________________ Net-snmp-coders mailing list Net-snmp-coders@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/net-snmp-coders