Hi Masa,

looks like it solves the problem reported by Josef, BUT it breaks DTLSUDP.
I run the tests w/o analyzing why.
To reproduce the issue I did the following using net-snmp master branch,
plus these patches
39485c6f2 - snmplib/snmp_api: Remove the request on the session when the
sending is failed (10 minutes ago) <Masayoshi Mizuma>
06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders Wallin>
a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 days
ago) <Anders Wallin>
eaad09d04 - (origin/master, origin/HEAD, master) Merge branch
'V5-8-patches' (9 weeks ago) <Bart Van Assche>

$ ./configure --prefix=/usr \
                --with-persistent-directory=/var/lib/net-snmp \
                --with-mib-modules='smux tlstm-mib tsm-mib examples/example
examples/notification' \
                --with-security-modules="tsm" \
                --with-transports="TLSTCP DTLSUDP" \
                --enable-shared \
                --with-defaults \
                --enable-ipv6 \
                --with-cflags="-g -O2" \
                --without-elf

$ make install
$ cd testing
$ ./RUNFULLTESTS -g tls
DTLS-UDP user certificate tests .......................... 41/?
 This hangs forever in "41" with snmpd.log saying....
......
2019-04-08 16:29:11
2019-04-08 16:29:11
Received 0 byte packet from DTLSUDP: unknown
2019-04-08 16:29:11
2019-04-08 16:29:13
Received 0 byte packet from DTLSUDP: unknown
2019-04-08 16:29:13
2019-04-08 16:29:15
Received 0 byte packet from DTLSUDP: unknown
2019-04-08 16:29:15
2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170
depth=0 err=18:self signed certificate
2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
2019-04-08 16:29:15  TLS error: SSL_read: rc=-1, sslerror = 1
(SSL_ERROR_SSL)
2019-04-08 16:29:15  TLS Error: certificate verify failed
2019-04-08 16:29:15 ---- End of OpenSSL Errors ----
2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5
(SSL_ERROR_SYSCALL): system_error=0 (Success)
2019-04-08 16:29:15 TLS Error: (null)
2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
(SSL_ERROR_SYSCALL): system_error=0 (Success)
2019-04-08 16:29:16 TLS Error: (null)
2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
(SSL_ERROR_SYSCALL): system_error=0 (Success)
2019-04-08 16:29:16 TLS Error: (null)

With the fix suggested på Josef I don't see the DTLSUDP problem, but maybe
there are other problems.

Regards
Anders Wallin

PS. thx for adding commit info to a420d87d3, I updated the patch with your
commit comments


On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com>
wrote:

> Hi Josef,
>
> I attach two patches to fix the memory inconsistency if the request is
> resend and timed out.
> Could you try them?
>
> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch
>
>   This patch was posted by Anders, and I tried to add the description.
>   This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback.
>
> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch
>
>   This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED
>   and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed,
>   then remove the request from the internal session.
>
> Thanks,
> Masa
>
> On 4/3/19 9:34 AM, Anders Wallin wrote:
> > The introduction of that code fixes another issue;
> > "commit 56c30b11f3616ea4f0c38a21e08e78f050096020
> > Author: Bill Fenner <fen...@gmail.com>
> > Date:   Wed Dec 20 21:52:10 2017 +0000
> >
> >     NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3
> > agent
> >
> >     With the patch in 1214, the snmp_api code assumed that if magic was
> >     set, it was the "struct synch-state" from snmp_client.  Of course,
> >     magic belongs to the caller, and the perl library uses it
> differently,
> >     so reaching into it is verboten.  Introduce a new callback (that
> >     was already introduced in 5.8) to report this "retries exceeded"
> >     state, and use it in snmp_client."
> >
> > I think the problem is really about shutting down the agentx connection
> > when one(1) response is to late. I have
> > done 2 patches (one that only write a better log message and one that
> > removes the "bad" code.
> > With these patches I don't get any crash. I think that 5.7.3 has this
> issue
> > as well, but it can not be crashed with the agentofdead code
> >
> > Can you please try this?
> >
> > Regards
> > Anders Wallin
> >
> >
> > On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote:
> >
> >> Hi,
> >>
> >> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that
> >> following callbacks in snmplib/snmp_api.c causes the core dump issue:
> >>
> >> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
> >> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
> >> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
> >>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
> >>          sp->s_errno = errno;
> >>          snmp_set_detail(strerror(errno));
> >> -        if (rp->callback)
> >> +/*        if (rp->callback)
> >>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
> >> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> >> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
> >>          return -1;
> >>      } else {
> >>          netsnmp_get_monotonic_clock(&now);
> >> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
> >>          tv.tv_sec += tv.tv_usec / 1000000L;
> >>          tv.tv_usec %= 1000000L;
> >>          rp->expireM = tv;
> >> -        if (rp->callback)
> >> +/*        if (rp->callback)
> >>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
> >> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> >> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
> >>      }
> >>      return 0;
> >>  }
> >>
> >> Without them, all works as expected.
> >>
> >> Josef Ridky
> >> Software Engineer
> >> Core Services Team
> >> Red Hat Czech, s.r.o.
> >>
> >> ----- Original Message -----
> >> | From: "Anders Wallin" <walli...@gmail.com>
> >> | To: "Josef Ridky" <jri...@redhat.com>
> >> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
> >> | Sent: Tuesday, April 2, 2019 6:27:54 PM
> >> | Subject: Re: Core dump with net-snmp-5.8
> >> |
> >> | Hi Josef,
> >> | I can reproduce the issue using the master branch, I will take a look
> at
> >> it
> >> | later tonight or tomorrow
> >> |
> >> | Regards
> >> | Anders Wallin
> >> |
> >> |
> >> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> wrote:
> >> |
> >> | > Hi,
> >> | >
> >> | > thanks for your patch. Unfortunately, even when I have applied it,
> it
> >> | > still ends with core dump due of 'double free or corruption
> (fasttop)'
> >> | >
> >> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
> >> | >
> >> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
> >> | > snmp_agent: delegate session == 0x56207e165240
> >> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
> >> | > agentx/master: callback resend
> >> | > agentx/master: callback resend
> >> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
> >> | > agentx/master: close 0x56207dfd5400, -1
> >> | > snmp_agent: removed 40 delegated request(s) for session
> 0x56207dfce490
> >> | > snmp_agent: processing delegated request, asp = 0x56207e165240
> >> | > snmp_agent: canceling next walk for asp 0x56207e165240
> >> | > snmp_agent: REMOVE session == 0x56207e165240
> >> | > snmp_agent: agent_session 0x56207e165240 released
> >> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
> >> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
> >> | > snmp_agent: REMOVE session == 0x56207e1041a0
> >> | > snmp_agent: agent_session 0x56207e1041a0 released
> >> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
> >> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
> >> | > snmp_agent: REMOVE session == 0x56207e1656c0
> >> | > snmp_agent: agent_session 0x56207e1656c0 released
> >> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
> >> | > snmp_agent: canceling next walk for asp 0x56207e11af40
> >> | > snmp_agent: REMOVE session == 0x56207e11af40
> >> | > snmp_agent: agent_session 0x56207e11af40 released
> >> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
> >> | > snmp_agent: canceling next walk for asp 0x56207e118f00
> >> | > snmp_agent: REMOVE session == 0x56207e118f00
> >> | > snmp_agent: agent_session 0x56207e118f00 released
> >> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
> >> | > snmp_agent: canceling next walk for asp 0x56207e11b540
> >> | > snmp_agent: REMOVE session == 0x56207e11b540
> >> | > snmp_agent: agent_session 0x56207e11b540 released
> >> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
> >> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
> >> | > snmp_agent: REMOVE session == 0x56207e11bd00
> >> | > snmp_agent: agent_session 0x56207e11bd00 released
> >> | > agentx/master: Continue removing delegated subsession reqests
> >> | > agentx/master: close transport
> >> | > snmp_agent: REMOVE session == 0x56207dfd5400
> >> | > agentx/master: response too late on session 0x56207dfd5400
> >> | > agentx/master: response too late on session 0x56207dfd5400
> >> | > double free or corruption (fasttop)
> >> | > Aborted (core dumped)
> >> | >
> >> | >
> >> | > What's interesting, when I run it with -DALL it pass (at least for
> >> several
> >> | > rounds).
> >> | > It looks like some strange race condition.
> >> | >
> >> | > Regards
> >> | >
> >> | > Josef Ridky
> >> | > Software Engineer
> >> | > Core Services Team
> >> | > Red Hat Czech, s.r.o.
> >> | >
> >> | > ----- Original Message -----
> >> | > | From: "Anders Wallin" <walli...@gmail.com>
> >> | > | To: "Josef Ridky" <jri...@redhat.com>
> >> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
> >> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
> >> | > | Subject: Re: Core dump with net-snmp-5.8
> >> | > |
> >> | > | Hi Josef,
> >> | > |
> >> | > | I think it's the same issue as
> >> | > https://sourceforge.net/p/net-snmp/bugs/2914/
> >> | > | (where I also posted the solution)
> >> | > | Regards
> >> | > | Anders Wallin
> >> | > |
> >> | > |
> >> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com>
> >> wrote:
> >> | > |
> >> | > | > Hi,
> >> | > | >
> >> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
> >> connected to
> >> | > the
> >> | > | > bug report [1].
> >> | > | >
> >> | > | > When I tried to run agentofdeath test from [1], snmpd daemon
> will
> >> crash
> >> | > | > with malloc(): smallbin double linked list corrupted or double
> >> free()
> >> | > issue
> >> | > | > and dumps core (see bellow).
> >> | > | > From log file, I can identified one issue with "Unknown
> operation".
> >> | > | >
> >> | > | > This issue is in the agentx_got_response function
> >> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented action
> >> for
> >> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
> >> | > | > include/net-snmp/library/snmp_api.h).
> >> | > | > As result "Unknown operation 6 in agentx_got_response" is shown
> in
> >> log
> >> | > | > file.
> >> | > | >
> >> | > | > /var/log/messages
> >> | > | > -------------------------------
> >> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in
> >> | > | > agentx_got_response
> >> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in
> >> | > | > agentx_got_response
> >> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin
> double
> >> | > linked
> >> | > | > list corrupted
> >> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core Dump
> >> (PID
> >> | > | > 13652/UID 0).
> >> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main
> process
> >> | > exited,
> >> | > | > code=dumped, status=6/ABRT
> >> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed with
> >> result
> >> | > | > 'core-dump'.
> >> | > | > -------------------------------
> >> | > | >
> >> | > | > The "Unknown operation" callback is caused by newly added piece
> of
> >> | > code in
> >> | > | > snmplib/snmp_api.c:
> >> | > | >
> >> | > | >  static int
> >> | > | >  snmp_resend_request(struct session_list *slp,
> netsnmp_request_list
> >> | > *rp,
> >> | > | >  int incr_retries)
> >> | > | >  {
> >> | > | >
> >> | > | > ...
> >> | > | >
> >> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
> >> | > | >          tv.tv_usec %= 1000000L;
> >> | > | >          rp->expireM = tv;
> >> | > | > +        if (rp->callback)
> >> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
> >> | > | > +                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> >> | > | >      }
> >> | > | >      return 0;
> >> | > | >  }
> >> | > | >
> >> | > | >
> >> | > | > When I tried to remove it, it just stop complaining about
> >> operation 6,
> >> | > but
> >> | > | > the core dump is still present.
> >> | > | >
> >> | > | > May I ask you for help with this issue? Do you have any idea,
> what
> >> | > causing
> >> | > | > this issue in 5.8 and how to fix it?
> >> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit [2],
> >> but it
> >> | > | > looks like something other has changed and this issue is current
> >> again.
> >> | > | >
> >> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
> >> | > | > [2]
> >> | > | >
> >> | >
> >>
> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
> >> | > | >
> >> | > | > Regards
> >> | > | >
> >> | > | > Josef Ridky
> >> | > | > Software Engineer
> >> | > | > Core Services Team
> >> | > | > Red Hat Czech, s.r.o.
> >> | > | >
> >> | > | >
> >> | > | >
> >> | > | > _______________________________________________
> >> | > | > Net-snmp-coders mailing list
> >> | > | > Net-snmp-coders@lists.sourceforge.net
> >> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
> >> | > | >
> >> | > |
> >> | >
> >> |
> >>
> >
> >
> >
> > _______________________________________________
> > Net-snmp-coders mailing list
> > Net-snmp-coders@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
> >
>
_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to