Hi Anders,

We've been testing the patch for

https://sourceforge.net/p/net-snmp/bugs/2914/

and this works.

Before the patch, we were getting lots of core dumps where the flow in
master.c hits the default case and memory gets double-freed and a core
happens.
We'd love to get this into V5-8-patches soon.

Thanks,
Sam



On Tue, Apr 9, 2019 at 9:13 AM Sam Tannous <stann...@cumulusnetworks.com>
wrote:

> Hi Anders,
>
> I fixed some snmpv3 (bulkget) coredumps a while ago.
> https://sourceforge.net/p/net-snmp/patches/1388/
>
> While not directly related, the (double-free memory) core dumps
> were easily triggered by any error condition within a v3 bulkget.
>
> I'm hoping my patch will get picked up soon :-(
>
> Thanks,
> Sam
>
> On Tue, Apr 9, 2019 at 6:54 AM Anders Wallin <walli...@gmail.com> wrote:
>
>> Now it works fine!
>>
>> thx
>> Anders Wallin
>>
>>
>> On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com>
>> wrote:
>>
>>> Hi Anders,
>>>
>>> Thank you for your feedback!
>>> I attach the v2 patch. Could you try it?
>>>
>>> On the v1 patch, I missed the check for the request callback. So, the
>>> request
>>> gets removed even though the callback doesn't run.
>>>
>>> Thanks,
>>> Masa
>>>
>>> On 4/8/19 11:06 AM, Anders Wallin wrote:
>>> > Hi Masa,
>>> >
>>> > looks like it solves the problem reported by Josef, BUT it breaks
>>> DTLSUDP.
>>> > I run the tests w/o analyzing why.
>>> > To reproduce the issue I did the following using net-snmp master
>>> branch,
>>> > plus these patches
>>> > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when
>>> the
>>> > sending is failed (10 minutes ago) <Masayoshi Mizuma>
>>> > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders
>>> Wallin>
>>> > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5
>>> days
>>> > ago) <Anders Wallin>
>>> > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch
>>> > 'V5-8-patches' (9 weeks ago) <Bart Van Assche>
>>> >
>>> > $ ./configure --prefix=/usr \
>>> >                 --with-persistent-directory=/var/lib/net-snmp \
>>> >                 --with-mib-modules='smux tlstm-mib tsm-mib
>>> examples/example
>>> > examples/notification' \
>>> >                 --with-security-modules="tsm" \
>>> >                 --with-transports="TLSTCP DTLSUDP" \
>>> >                 --enable-shared \
>>> >                 --with-defaults \
>>> >                 --enable-ipv6 \
>>> >                 --with-cflags="-g -O2" \
>>> >                 --without-elf
>>> >
>>> > $ make install
>>> > $ cd testing
>>> > $ ./RUNFULLTESTS -g tls
>>> > DTLS-UDP user certificate tests .......................... 41/?
>>> >  This hangs forever in "41" with snmpd.log saying....
>>> > ......
>>> > 2019-04-08 16:29:11
>>> > 2019-04-08 16:29:11
>>> > Received 0 byte packet from DTLSUDP: unknown
>>> > 2019-04-08 16:29:11
>>> > 2019-04-08 16:29:13
>>> > Received 0 byte packet from DTLSUDP: unknown
>>> > 2019-04-08 16:29:13
>>> > 2019-04-08 16:29:15
>>> > Received 0 byte packet from DTLSUDP: unknown
>>> > 2019-04-08 16:29:15
>>> > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170
>>> > depth=0 err=18:self signed certificate
>>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
>>> > 2019-04-08 16:29:15  TLS error: SSL_read: rc=-1, sslerror = 1
>>> > (SSL_ERROR_SSL)
>>> > 2019-04-08 16:29:15  TLS Error: certificate verify failed
>>> > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ----
>>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
>>> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5
>>> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
>>> > 2019-04-08 16:29:15 TLS Error: (null)
>>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
>>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
>>> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
>>> > 2019-04-08 16:29:16 TLS Error: (null)
>>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
>>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
>>> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
>>> > 2019-04-08 16:29:16 TLS Error: (null)
>>> >
>>> > With the fix suggested på Josef I don't see the DTLSUDP problem, but
>>> maybe
>>> > there are other problems.
>>> >
>>> > Regards
>>> > Anders Wallin
>>> >
>>> > PS. thx for adding commit info to a420d87d3, I updated the patch with
>>> your
>>> > commit comments
>>> >
>>> >
>>> > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com
>>> >
>>> > wrote:
>>> >
>>> >> Hi Josef,
>>> >>
>>> >> I attach two patches to fix the memory inconsistency if the request is
>>> >> resend and timed out.
>>> >> Could you try them?
>>> >>
>>> >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch
>>> >>
>>> >>   This patch was posted by Anders, and I tried to add the description.
>>> >>   This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback.
>>> >>
>>> >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch
>>> >>
>>> >>   This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED
>>> >>   and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is
>>> failed,
>>> >>   then remove the request from the internal session.
>>> >>
>>> >> Thanks,
>>> >> Masa
>>> >>
>>> >> On 4/3/19 9:34 AM, Anders Wallin wrote:
>>> >>> The introduction of that code fixes another issue;
>>> >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020
>>> >>> Author: Bill Fenner <fen...@gmail.com>
>>> >>> Date:   Wed Dec 20 21:52:10 2017 +0000
>>> >>>
>>> >>>     NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad
>>> SNMPv3
>>> >>> agent
>>> >>>
>>> >>>     With the patch in 1214, the snmp_api code assumed that if magic
>>> was
>>> >>>     set, it was the "struct synch-state" from snmp_client.  Of
>>> course,
>>> >>>     magic belongs to the caller, and the perl library uses it
>>> >> differently,
>>> >>>     so reaching into it is verboten.  Introduce a new callback (that
>>> >>>     was already introduced in 5.8) to report this "retries exceeded"
>>> >>>     state, and use it in snmp_client."
>>> >>>
>>> >>> I think the problem is really about shutting down the agentx
>>> connection
>>> >>> when one(1) response is to late. I have
>>> >>> done 2 patches (one that only write a better log message and one that
>>> >>> removes the "bad" code.
>>> >>> With these patches I don't get any crash. I think that 5.7.3 has this
>>> >> issue
>>> >>> as well, but it can not be crashed with the agentofdead code
>>> >>>
>>> >>> Can you please try this?
>>> >>>
>>> >>> Regards
>>> >>> Anders Wallin
>>> >>>
>>> >>>
>>> >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com>
>>> wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found,
>>> that
>>> >>>> following callbacks in snmplib/snmp_api.c causes the core dump
>>> issue:
>>> >>>>
>>> >>>> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
>>> >>>> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
>>> >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
>>> >>>>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
>>> >>>>          sp->s_errno = errno;
>>> >>>>          snmp_set_detail(strerror(errno));
>>> >>>> -        if (rp->callback)
>>> >>>> +/*        if (rp->callback)
>>> >>>>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
>>> >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>>> >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>>> >>>>          return -1;
>>> >>>>      } else {
>>> >>>>          netsnmp_get_monotonic_clock(&now);
>>> >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
>>> >>>>          tv.tv_sec += tv.tv_usec / 1000000L;
>>> >>>>          tv.tv_usec %= 1000000L;
>>> >>>>          rp->expireM = tv;
>>> >>>> -        if (rp->callback)
>>> >>>> +/*        if (rp->callback)
>>> >>>>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>>> >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>>> >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>>> >>>>      }
>>> >>>>      return 0;
>>> >>>>  }
>>> >>>>
>>> >>>> Without them, all works as expected.
>>> >>>>
>>> >>>> Josef Ridky
>>> >>>> Software Engineer
>>> >>>> Core Services Team
>>> >>>> Red Hat Czech, s.r.o.
>>> >>>>
>>> >>>> ----- Original Message -----
>>> >>>> | From: "Anders Wallin" <walli...@gmail.com>
>>> >>>> | To: "Josef Ridky" <jri...@redhat.com>
>>> >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>>> >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM
>>> >>>> | Subject: Re: Core dump with net-snmp-5.8
>>> >>>> |
>>> >>>> | Hi Josef,
>>> >>>> | I can reproduce the issue using the master branch, I will take a
>>> look
>>> >> at
>>> >>>> it
>>> >>>> | later tonight or tomorrow
>>> >>>> |
>>> >>>> | Regards
>>> >>>> | Anders Wallin
>>> >>>> |
>>> >>>> |
>>> >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com>
>>> wrote:
>>> >>>> |
>>> >>>> | > Hi,
>>> >>>> | >
>>> >>>> | > thanks for your patch. Unfortunately, even when I have applied
>>> it,
>>> >> it
>>> >>>> | > still ends with core dump due of 'double free or corruption
>>> >> (fasttop)'
>>> >>>> | >
>>> >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
>>> >>>> | >
>>> >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
>>> >>>> | > snmp_agent: delegate session == 0x56207e165240
>>> >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
>>> >>>> | > agentx/master: callback resend
>>> >>>> | > agentx/master: callback resend
>>> >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
>>> >>>> | > agentx/master: close 0x56207dfd5400, -1
>>> >>>> | > snmp_agent: removed 40 delegated request(s) for session
>>> >> 0x56207dfce490
>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240
>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240
>>> >>>> | > snmp_agent: REMOVE session == 0x56207e165240
>>> >>>> | > snmp_agent: agent_session 0x56207e165240 released
>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
>>> >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0
>>> >>>> | > snmp_agent: agent_session 0x56207e1041a0 released
>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
>>> >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0
>>> >>>> | > snmp_agent: agent_session 0x56207e1656c0 released
>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40
>>> >>>> | > snmp_agent: REMOVE session == 0x56207e11af40
>>> >>>> | > snmp_agent: agent_session 0x56207e11af40 released
>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00
>>> >>>> | > snmp_agent: REMOVE session == 0x56207e118f00
>>> >>>> | > snmp_agent: agent_session 0x56207e118f00 released
>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540
>>> >>>> | > snmp_agent: REMOVE session == 0x56207e11b540
>>> >>>> | > snmp_agent: agent_session 0x56207e11b540 released
>>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
>>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
>>> >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00
>>> >>>> | > snmp_agent: agent_session 0x56207e11bd00 released
>>> >>>> | > agentx/master: Continue removing delegated subsession reqests
>>> >>>> | > agentx/master: close transport
>>> >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400
>>> >>>> | > agentx/master: response too late on session 0x56207dfd5400
>>> >>>> | > agentx/master: response too late on session 0x56207dfd5400
>>> >>>> | > double free or corruption (fasttop)
>>> >>>> | > Aborted (core dumped)
>>> >>>> | >
>>> >>>> | >
>>> >>>> | > What's interesting, when I run it with -DALL it pass (at least
>>> for
>>> >>>> several
>>> >>>> | > rounds).
>>> >>>> | > It looks like some strange race condition.
>>> >>>> | >
>>> >>>> | > Regards
>>> >>>> | >
>>> >>>> | > Josef Ridky
>>> >>>> | > Software Engineer
>>> >>>> | > Core Services Team
>>> >>>> | > Red Hat Czech, s.r.o.
>>> >>>> | >
>>> >>>> | > ----- Original Message -----
>>> >>>> | > | From: "Anders Wallin" <walli...@gmail.com>
>>> >>>> | > | To: "Josef Ridky" <jri...@redhat.com>
>>> >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>>> >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
>>> >>>> | > | Subject: Re: Core dump with net-snmp-5.8
>>> >>>> | > |
>>> >>>> | > | Hi Josef,
>>> >>>> | > |
>>> >>>> | > | I think it's the same issue as
>>> >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/
>>> >>>> | > | (where I also posted the solution)
>>> >>>> | > | Regards
>>> >>>> | > | Anders Wallin
>>> >>>> | > |
>>> >>>> | > |
>>> >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <
>>> jri...@redhat.com>
>>> >>>> wrote:
>>> >>>> | > |
>>> >>>> | > | > Hi,
>>> >>>> | > | >
>>> >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
>>> >>>> connected to
>>> >>>> | > the
>>> >>>> | > | > bug report [1].
>>> >>>> | > | >
>>> >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon
>>> >> will
>>> >>>> crash
>>> >>>> | > | > with malloc(): smallbin double linked list corrupted or
>>> double
>>> >>>> free()
>>> >>>> | > issue
>>> >>>> | > | > and dumps core (see bellow).
>>> >>>> | > | > From log file, I can identified one issue with "Unknown
>>> >> operation".
>>> >>>> | > | >
>>> >>>> | > | > This issue is in the agentx_got_response function
>>> >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented
>>> action
>>> >>>> for
>>> >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
>>> >>>> | > | > include/net-snmp/library/snmp_api.h).
>>> >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is
>>> shown
>>> >> in
>>> >>>> log
>>> >>>> | > | > file.
>>> >>>> | > | >
>>> >>>> | > | > /var/log/messages
>>> >>>> | > | > -------------------------------
>>> >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6
>>> in
>>> >>>> | > | > agentx_got_response
>>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6
>>> in
>>> >>>> | > | > agentx_got_response
>>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin
>>> >> double
>>> >>>> | > linked
>>> >>>> | > | > list corrupted
>>> >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core
>>> Dump
>>> >>>> (PID
>>> >>>> | > | > 13652/UID 0).
>>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main
>>> >> process
>>> >>>> | > exited,
>>> >>>> | > | > code=dumped, status=6/ABRT
>>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed
>>> with
>>> >>>> result
>>> >>>> | > | > 'core-dump'.
>>> >>>> | > | > -------------------------------
>>> >>>> | > | >
>>> >>>> | > | > The "Unknown operation" callback is caused by newly added
>>> piece
>>> >> of
>>> >>>> | > code in
>>> >>>> | > | > snmplib/snmp_api.c:
>>> >>>> | > | >
>>> >>>> | > | >  static int
>>> >>>> | > | >  snmp_resend_request(struct session_list *slp,
>>> >> netsnmp_request_list
>>> >>>> | > *rp,
>>> >>>> | > | >  int incr_retries)
>>> >>>> | > | >  {
>>> >>>> | > | >
>>> >>>> | > | > ...
>>> >>>> | > | >
>>> >>>> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
>>> >>>> | > | >          tv.tv_usec %= 1000000L;
>>> >>>> | > | >          rp->expireM = tv;
>>> >>>> | > | > +        if (rp->callback)
>>> >>>> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>>> >>>> | > | > +                         rp->pdu->reqid, rp->pdu,
>>> rp->cb_data);
>>> >>>> | > | >      }
>>> >>>> | > | >      return 0;
>>> >>>> | > | >  }
>>> >>>> | > | >
>>> >>>> | > | >
>>> >>>> | > | > When I tried to remove it, it just stop complaining about
>>> >>>> operation 6,
>>> >>>> | > but
>>> >>>> | > | > the core dump is still present.
>>> >>>> | > | >
>>> >>>> | > | > May I ask you for help with this issue? Do you have any
>>> idea,
>>> >> what
>>> >>>> | > causing
>>> >>>> | > | > this issue in 5.8 and how to fix it?
>>> >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit
>>> [2],
>>> >>>> but it
>>> >>>> | > | > looks like something other has changed and this issue is
>>> current
>>> >>>> again.
>>> >>>> | > | >
>>> >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
>>> >>>> | > | > [2]
>>> >>>> | > | >
>>> >>>> | >
>>> >>>>
>>> >>
>>> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
>>> >>>> | > | >
>>> >>>> | > | > Regards
>>> >>>> | > | >
>>> >>>> | > | > Josef Ridky
>>> >>>> | > | > Software Engineer
>>> >>>> | > | > Core Services Team
>>> >>>> | > | > Red Hat Czech, s.r.o.
>>> >>>> | > | >
>>> >>>> | > | >
>>> >>>> | > | >
>>> >>>> | > | > _______________________________________________
>>> >>>> | > | > Net-snmp-coders mailing list
>>> >>>> | > | > Net-snmp-coders@lists.sourceforge.net
>>> >>>> | > | >
>>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>>> >>>> | > | >
>>> >>>> | > |
>>> >>>> | >
>>> >>>> |
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> Net-snmp-coders mailing list
>>> >>> Net-snmp-coders@lists.sourceforge.net
>>> >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>>> >>>
>>> >>
>>> >
>>>
>> _______________________________________________
>> Net-snmp-coders mailing list
>> Net-snmp-coders@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>>
>
>
> --
> *Sam Tannous*
> Engineering
> Cumulus Networks®
> +1 650 383 6700 x 1106
> <http://www.cumulusnetworks,com>www.cumulusnetworks.com
>
> Evaluate Cumulus® Linux®
> https://cumulusnetworks.com/product/secure/evaluate/
>
> Become a Partner
> http://cumulusnetworks.com/partners/become-a-partner/
>


-- 
*Sam Tannous*
Engineering
Cumulus Networks®
+1 650 383 6700 x 1106
<http://www.cumulusnetworks,com>www.cumulusnetworks.com

Evaluate Cumulus® Linux®
https://cumulusnetworks.com/product/secure/evaluate/

Become a Partner
http://cumulusnetworks.com/partners/become-a-partner/
_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to