Hi Anders,

I fixed some snmpv3 (bulkget) coredumps a while ago.
https://sourceforge.net/p/net-snmp/patches/1388/

While not directly related, the (double-free memory) core dumps
were easily triggered by any error condition within a v3 bulkget.

I'm hoping my patch will get picked up soon :-(

Thanks,
Sam

On Tue, Apr 9, 2019 at 6:54 AM Anders Wallin <walli...@gmail.com> wrote:

> Now it works fine!
>
> thx
> Anders Wallin
>
>
> On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com>
> wrote:
>
>> Hi Anders,
>>
>> Thank you for your feedback!
>> I attach the v2 patch. Could you try it?
>>
>> On the v1 patch, I missed the check for the request callback. So, the
>> request
>> gets removed even though the callback doesn't run.
>>
>> Thanks,
>> Masa
>>
>> On 4/8/19 11:06 AM, Anders Wallin wrote:
>> > Hi Masa,
>> >
>> > looks like it solves the problem reported by Josef, BUT it breaks
>> DTLSUDP.
>> > I run the tests w/o analyzing why.
>> > To reproduce the issue I did the following using net-snmp master branch,
>> > plus these patches
>> > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the
>> > sending is failed (10 minutes ago) <Masayoshi Mizuma>
>> > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders
>> Wallin>
>> > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5
>> days
>> > ago) <Anders Wallin>
>> > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch
>> > 'V5-8-patches' (9 weeks ago) <Bart Van Assche>
>> >
>> > $ ./configure --prefix=/usr \
>> >                 --with-persistent-directory=/var/lib/net-snmp \
>> >                 --with-mib-modules='smux tlstm-mib tsm-mib
>> examples/example
>> > examples/notification' \
>> >                 --with-security-modules="tsm" \
>> >                 --with-transports="TLSTCP DTLSUDP" \
>> >                 --enable-shared \
>> >                 --with-defaults \
>> >                 --enable-ipv6 \
>> >                 --with-cflags="-g -O2" \
>> >                 --without-elf
>> >
>> > $ make install
>> > $ cd testing
>> > $ ./RUNFULLTESTS -g tls
>> > DTLS-UDP user certificate tests .......................... 41/?
>> >  This hangs forever in "41" with snmpd.log saying....
>> > ......
>> > 2019-04-08 16:29:11
>> > 2019-04-08 16:29:11
>> > Received 0 byte packet from DTLSUDP: unknown
>> > 2019-04-08 16:29:11
>> > 2019-04-08 16:29:13
>> > Received 0 byte packet from DTLSUDP: unknown
>> > 2019-04-08 16:29:13
>> > 2019-04-08 16:29:15
>> > Received 0 byte packet from DTLSUDP: unknown
>> > 2019-04-08 16:29:15
>> > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170
>> > depth=0 err=18:self signed certificate
>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
>> > 2019-04-08 16:29:15  TLS error: SSL_read: rc=-1, sslerror = 1
>> > (SSL_ERROR_SSL)
>> > 2019-04-08 16:29:15  TLS Error: certificate verify failed
>> > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ----
>> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
>> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5
>> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
>> > 2019-04-08 16:29:15 TLS Error: (null)
>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
>> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
>> > 2019-04-08 16:29:16 TLS Error: (null)
>> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
>> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
>> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
>> > 2019-04-08 16:29:16 TLS Error: (null)
>> >
>> > With the fix suggested på Josef I don't see the DTLSUDP problem, but
>> maybe
>> > there are other problems.
>> >
>> > Regards
>> > Anders Wallin
>> >
>> > PS. thx for adding commit info to a420d87d3, I updated the patch with
>> your
>> > commit comments
>> >
>> >
>> > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com>
>> > wrote:
>> >
>> >> Hi Josef,
>> >>
>> >> I attach two patches to fix the memory inconsistency if the request is
>> >> resend and timed out.
>> >> Could you try them?
>> >>
>> >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch
>> >>
>> >>   This patch was posted by Anders, and I tried to add the description.
>> >>   This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback.
>> >>
>> >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch
>> >>
>> >>   This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED
>> >>   and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed,
>> >>   then remove the request from the internal session.
>> >>
>> >> Thanks,
>> >> Masa
>> >>
>> >> On 4/3/19 9:34 AM, Anders Wallin wrote:
>> >>> The introduction of that code fixes another issue;
>> >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020
>> >>> Author: Bill Fenner <fen...@gmail.com>
>> >>> Date:   Wed Dec 20 21:52:10 2017 +0000
>> >>>
>> >>>     NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad
>> SNMPv3
>> >>> agent
>> >>>
>> >>>     With the patch in 1214, the snmp_api code assumed that if magic
>> was
>> >>>     set, it was the "struct synch-state" from snmp_client.  Of course,
>> >>>     magic belongs to the caller, and the perl library uses it
>> >> differently,
>> >>>     so reaching into it is verboten.  Introduce a new callback (that
>> >>>     was already introduced in 5.8) to report this "retries exceeded"
>> >>>     state, and use it in snmp_client."
>> >>>
>> >>> I think the problem is really about shutting down the agentx
>> connection
>> >>> when one(1) response is to late. I have
>> >>> done 2 patches (one that only write a better log message and one that
>> >>> removes the "bad" code.
>> >>> With these patches I don't get any crash. I think that 5.7.3 has this
>> >> issue
>> >>> as well, but it can not be crashed with the agentofdead code
>> >>>
>> >>> Can you please try this?
>> >>>
>> >>> Regards
>> >>> Anders Wallin
>> >>>
>> >>>
>> >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com>
>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found,
>> that
>> >>>> following callbacks in snmplib/snmp_api.c causes the core dump issue:
>> >>>>
>> >>>> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
>> >>>> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
>> >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
>> >>>>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
>> >>>>          sp->s_errno = errno;
>> >>>>          snmp_set_detail(strerror(errno));
>> >>>> -        if (rp->callback)
>> >>>> +/*        if (rp->callback)
>> >>>>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
>> >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>> >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>> >>>>          return -1;
>> >>>>      } else {
>> >>>>          netsnmp_get_monotonic_clock(&now);
>> >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
>> >>>>          tv.tv_sec += tv.tv_usec / 1000000L;
>> >>>>          tv.tv_usec %= 1000000L;
>> >>>>          rp->expireM = tv;
>> >>>> -        if (rp->callback)
>> >>>> +/*        if (rp->callback)
>> >>>>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>> >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>> >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>> >>>>      }
>> >>>>      return 0;
>> >>>>  }
>> >>>>
>> >>>> Without them, all works as expected.
>> >>>>
>> >>>> Josef Ridky
>> >>>> Software Engineer
>> >>>> Core Services Team
>> >>>> Red Hat Czech, s.r.o.
>> >>>>
>> >>>> ----- Original Message -----
>> >>>> | From: "Anders Wallin" <walli...@gmail.com>
>> >>>> | To: "Josef Ridky" <jri...@redhat.com>
>> >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>> >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM
>> >>>> | Subject: Re: Core dump with net-snmp-5.8
>> >>>> |
>> >>>> | Hi Josef,
>> >>>> | I can reproduce the issue using the master branch, I will take a
>> look
>> >> at
>> >>>> it
>> >>>> | later tonight or tomorrow
>> >>>> |
>> >>>> | Regards
>> >>>> | Anders Wallin
>> >>>> |
>> >>>> |
>> >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com>
>> wrote:
>> >>>> |
>> >>>> | > Hi,
>> >>>> | >
>> >>>> | > thanks for your patch. Unfortunately, even when I have applied
>> it,
>> >> it
>> >>>> | > still ends with core dump due of 'double free or corruption
>> >> (fasttop)'
>> >>>> | >
>> >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
>> >>>> | >
>> >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
>> >>>> | > snmp_agent: delegate session == 0x56207e165240
>> >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
>> >>>> | > agentx/master: callback resend
>> >>>> | > agentx/master: callback resend
>> >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
>> >>>> | > agentx/master: close 0x56207dfd5400, -1
>> >>>> | > snmp_agent: removed 40 delegated request(s) for session
>> >> 0x56207dfce490
>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240
>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240
>> >>>> | > snmp_agent: REMOVE session == 0x56207e165240
>> >>>> | > snmp_agent: agent_session 0x56207e165240 released
>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
>> >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0
>> >>>> | > snmp_agent: agent_session 0x56207e1041a0 released
>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
>> >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0
>> >>>> | > snmp_agent: agent_session 0x56207e1656c0 released
>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40
>> >>>> | > snmp_agent: REMOVE session == 0x56207e11af40
>> >>>> | > snmp_agent: agent_session 0x56207e11af40 released
>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00
>> >>>> | > snmp_agent: REMOVE session == 0x56207e118f00
>> >>>> | > snmp_agent: agent_session 0x56207e118f00 released
>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540
>> >>>> | > snmp_agent: REMOVE session == 0x56207e11b540
>> >>>> | > snmp_agent: agent_session 0x56207e11b540 released
>> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
>> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
>> >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00
>> >>>> | > snmp_agent: agent_session 0x56207e11bd00 released
>> >>>> | > agentx/master: Continue removing delegated subsession reqests
>> >>>> | > agentx/master: close transport
>> >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400
>> >>>> | > agentx/master: response too late on session 0x56207dfd5400
>> >>>> | > agentx/master: response too late on session 0x56207dfd5400
>> >>>> | > double free or corruption (fasttop)
>> >>>> | > Aborted (core dumped)
>> >>>> | >
>> >>>> | >
>> >>>> | > What's interesting, when I run it with -DALL it pass (at least
>> for
>> >>>> several
>> >>>> | > rounds).
>> >>>> | > It looks like some strange race condition.
>> >>>> | >
>> >>>> | > Regards
>> >>>> | >
>> >>>> | > Josef Ridky
>> >>>> | > Software Engineer
>> >>>> | > Core Services Team
>> >>>> | > Red Hat Czech, s.r.o.
>> >>>> | >
>> >>>> | > ----- Original Message -----
>> >>>> | > | From: "Anders Wallin" <walli...@gmail.com>
>> >>>> | > | To: "Josef Ridky" <jri...@redhat.com>
>> >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>> >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
>> >>>> | > | Subject: Re: Core dump with net-snmp-5.8
>> >>>> | > |
>> >>>> | > | Hi Josef,
>> >>>> | > |
>> >>>> | > | I think it's the same issue as
>> >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/
>> >>>> | > | (where I also posted the solution)
>> >>>> | > | Regards
>> >>>> | > | Anders Wallin
>> >>>> | > |
>> >>>> | > |
>> >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com
>> >
>> >>>> wrote:
>> >>>> | > |
>> >>>> | > | > Hi,
>> >>>> | > | >
>> >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
>> >>>> connected to
>> >>>> | > the
>> >>>> | > | > bug report [1].
>> >>>> | > | >
>> >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon
>> >> will
>> >>>> crash
>> >>>> | > | > with malloc(): smallbin double linked list corrupted or
>> double
>> >>>> free()
>> >>>> | > issue
>> >>>> | > | > and dumps core (see bellow).
>> >>>> | > | > From log file, I can identified one issue with "Unknown
>> >> operation".
>> >>>> | > | >
>> >>>> | > | > This issue is in the agentx_got_response function
>> >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented
>> action
>> >>>> for
>> >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
>> >>>> | > | > include/net-snmp/library/snmp_api.h).
>> >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is
>> shown
>> >> in
>> >>>> log
>> >>>> | > | > file.
>> >>>> | > | >
>> >>>> | > | > /var/log/messages
>> >>>> | > | > -------------------------------
>> >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6
>> in
>> >>>> | > | > agentx_got_response
>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6
>> in
>> >>>> | > | > agentx_got_response
>> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin
>> >> double
>> >>>> | > linked
>> >>>> | > | > list corrupted
>> >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core
>> Dump
>> >>>> (PID
>> >>>> | > | > 13652/UID 0).
>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main
>> >> process
>> >>>> | > exited,
>> >>>> | > | > code=dumped, status=6/ABRT
>> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed
>> with
>> >>>> result
>> >>>> | > | > 'core-dump'.
>> >>>> | > | > -------------------------------
>> >>>> | > | >
>> >>>> | > | > The "Unknown operation" callback is caused by newly added
>> piece
>> >> of
>> >>>> | > code in
>> >>>> | > | > snmplib/snmp_api.c:
>> >>>> | > | >
>> >>>> | > | >  static int
>> >>>> | > | >  snmp_resend_request(struct session_list *slp,
>> >> netsnmp_request_list
>> >>>> | > *rp,
>> >>>> | > | >  int incr_retries)
>> >>>> | > | >  {
>> >>>> | > | >
>> >>>> | > | > ...
>> >>>> | > | >
>> >>>> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
>> >>>> | > | >          tv.tv_usec %= 1000000L;
>> >>>> | > | >          rp->expireM = tv;
>> >>>> | > | > +        if (rp->callback)
>> >>>> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>> >>>> | > | > +                         rp->pdu->reqid, rp->pdu,
>> rp->cb_data);
>> >>>> | > | >      }
>> >>>> | > | >      return 0;
>> >>>> | > | >  }
>> >>>> | > | >
>> >>>> | > | >
>> >>>> | > | > When I tried to remove it, it just stop complaining about
>> >>>> operation 6,
>> >>>> | > but
>> >>>> | > | > the core dump is still present.
>> >>>> | > | >
>> >>>> | > | > May I ask you for help with this issue? Do you have any idea,
>> >> what
>> >>>> | > causing
>> >>>> | > | > this issue in 5.8 and how to fix it?
>> >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit
>> [2],
>> >>>> but it
>> >>>> | > | > looks like something other has changed and this issue is
>> current
>> >>>> again.
>> >>>> | > | >
>> >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
>> >>>> | > | > [2]
>> >>>> | > | >
>> >>>> | >
>> >>>>
>> >>
>> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
>> >>>> | > | >
>> >>>> | > | > Regards
>> >>>> | > | >
>> >>>> | > | > Josef Ridky
>> >>>> | > | > Software Engineer
>> >>>> | > | > Core Services Team
>> >>>> | > | > Red Hat Czech, s.r.o.
>> >>>> | > | >
>> >>>> | > | >
>> >>>> | > | >
>> >>>> | > | > _______________________________________________
>> >>>> | > | > Net-snmp-coders mailing list
>> >>>> | > | > Net-snmp-coders@lists.sourceforge.net
>> >>>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>> >>>> | > | >
>> >>>> | > |
>> >>>> | >
>> >>>> |
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Net-snmp-coders mailing list
>> >>> Net-snmp-coders@lists.sourceforge.net
>> >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>> >>>
>> >>
>> >
>>
> _______________________________________________
> Net-snmp-coders mailing list
> Net-snmp-coders@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>


-- 
*Sam Tannous*
Engineering
Cumulus Networks®
+1 650 383 6700 x 1106
<http://www.cumulusnetworks,com>www.cumulusnetworks.com

Evaluate Cumulus® Linux®
https://cumulusnetworks.com/product/secure/evaluate/

Become a Partner
http://cumulusnetworks.com/partners/become-a-partner/
_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to