Now it works fine!

thx
Anders Wallin


On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com>
wrote:

> Hi Anders,
>
> Thank you for your feedback!
> I attach the v2 patch. Could you try it?
>
> On the v1 patch, I missed the check for the request callback. So, the
> request
> gets removed even though the callback doesn't run.
>
> Thanks,
> Masa
>
> On 4/8/19 11:06 AM, Anders Wallin wrote:
> > Hi Masa,
> >
> > looks like it solves the problem reported by Josef, BUT it breaks
> DTLSUDP.
> > I run the tests w/o analyzing why.
> > To reproduce the issue I did the following using net-snmp master branch,
> > plus these patches
> > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the
> > sending is failed (10 minutes ago) <Masayoshi Mizuma>
> > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders
> Wallin>
> > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 days
> > ago) <Anders Wallin>
> > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch
> > 'V5-8-patches' (9 weeks ago) <Bart Van Assche>
> >
> > $ ./configure --prefix=/usr \
> >                 --with-persistent-directory=/var/lib/net-snmp \
> >                 --with-mib-modules='smux tlstm-mib tsm-mib
> examples/example
> > examples/notification' \
> >                 --with-security-modules="tsm" \
> >                 --with-transports="TLSTCP DTLSUDP" \
> >                 --enable-shared \
> >                 --with-defaults \
> >                 --enable-ipv6 \
> >                 --with-cflags="-g -O2" \
> >                 --without-elf
> >
> > $ make install
> > $ cd testing
> > $ ./RUNFULLTESTS -g tls
> > DTLS-UDP user certificate tests .......................... 41/?
> >  This hangs forever in "41" with snmpd.log saying....
> > ......
> > 2019-04-08 16:29:11
> > 2019-04-08 16:29:11
> > Received 0 byte packet from DTLSUDP: unknown
> > 2019-04-08 16:29:11
> > 2019-04-08 16:29:13
> > Received 0 byte packet from DTLSUDP: unknown
> > 2019-04-08 16:29:13
> > 2019-04-08 16:29:15
> > Received 0 byte packet from DTLSUDP: unknown
> > 2019-04-08 16:29:15
> > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170
> > depth=0 err=18:self signed certificate
> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
> > 2019-04-08 16:29:15  TLS error: SSL_read: rc=-1, sslerror = 1
> > (SSL_ERROR_SSL)
> > 2019-04-08 16:29:15  TLS Error: certificate verify failed
> > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ----
> > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
> > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5
> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
> > 2019-04-08 16:29:15 TLS Error: (null)
> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
> > 2019-04-08 16:29:16 TLS Error: (null)
> > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
> > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
> > (SSL_ERROR_SYSCALL): system_error=0 (Success)
> > 2019-04-08 16:29:16 TLS Error: (null)
> >
> > With the fix suggested på Josef I don't see the DTLSUDP problem, but
> maybe
> > there are other problems.
> >
> > Regards
> > Anders Wallin
> >
> > PS. thx for adding commit info to a420d87d3, I updated the patch with
> your
> > commit comments
> >
> >
> > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com>
> > wrote:
> >
> >> Hi Josef,
> >>
> >> I attach two patches to fix the memory inconsistency if the request is
> >> resend and timed out.
> >> Could you try them?
> >>
> >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch
> >>
> >>   This patch was posted by Anders, and I tried to add the description.
> >>   This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback.
> >>
> >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch
> >>
> >>   This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED
> >>   and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed,
> >>   then remove the request from the internal session.
> >>
> >> Thanks,
> >> Masa
> >>
> >> On 4/3/19 9:34 AM, Anders Wallin wrote:
> >>> The introduction of that code fixes another issue;
> >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020
> >>> Author: Bill Fenner <fen...@gmail.com>
> >>> Date:   Wed Dec 20 21:52:10 2017 +0000
> >>>
> >>>     NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3
> >>> agent
> >>>
> >>>     With the patch in 1214, the snmp_api code assumed that if magic was
> >>>     set, it was the "struct synch-state" from snmp_client.  Of course,
> >>>     magic belongs to the caller, and the perl library uses it
> >> differently,
> >>>     so reaching into it is verboten.  Introduce a new callback (that
> >>>     was already introduced in 5.8) to report this "retries exceeded"
> >>>     state, and use it in snmp_client."
> >>>
> >>> I think the problem is really about shutting down the agentx connection
> >>> when one(1) response is to late. I have
> >>> done 2 patches (one that only write a better log message and one that
> >>> removes the "bad" code.
> >>> With these patches I don't get any crash. I think that 5.7.3 has this
> >> issue
> >>> as well, but it can not be crashed with the agentofdead code
> >>>
> >>> Can you please try this?
> >>>
> >>> Regards
> >>> Anders Wallin
> >>>
> >>>
> >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that
> >>>> following callbacks in snmplib/snmp_api.c causes the core dump issue:
> >>>>
> >>>> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
> >>>> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
> >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
> >>>>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
> >>>>          sp->s_errno = errno;
> >>>>          snmp_set_detail(strerror(errno));
> >>>> -        if (rp->callback)
> >>>> +/*        if (rp->callback)
> >>>>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
> >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
> >>>>          return -1;
> >>>>      } else {
> >>>>          netsnmp_get_monotonic_clock(&now);
> >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
> >>>>          tv.tv_sec += tv.tv_usec / 1000000L;
> >>>>          tv.tv_usec %= 1000000L;
> >>>>          rp->expireM = tv;
> >>>> -        if (rp->callback)
> >>>> +/*        if (rp->callback)
> >>>>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
> >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
> >>>>      }
> >>>>      return 0;
> >>>>  }
> >>>>
> >>>> Without them, all works as expected.
> >>>>
> >>>> Josef Ridky
> >>>> Software Engineer
> >>>> Core Services Team
> >>>> Red Hat Czech, s.r.o.
> >>>>
> >>>> ----- Original Message -----
> >>>> | From: "Anders Wallin" <walli...@gmail.com>
> >>>> | To: "Josef Ridky" <jri...@redhat.com>
> >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
> >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM
> >>>> | Subject: Re: Core dump with net-snmp-5.8
> >>>> |
> >>>> | Hi Josef,
> >>>> | I can reproduce the issue using the master branch, I will take a
> look
> >> at
> >>>> it
> >>>> | later tonight or tomorrow
> >>>> |
> >>>> | Regards
> >>>> | Anders Wallin
> >>>> |
> >>>> |
> >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com>
> wrote:
> >>>> |
> >>>> | > Hi,
> >>>> | >
> >>>> | > thanks for your patch. Unfortunately, even when I have applied it,
> >> it
> >>>> | > still ends with core dump due of 'double free or corruption
> >> (fasttop)'
> >>>> | >
> >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
> >>>> | >
> >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
> >>>> | > snmp_agent: delegate session == 0x56207e165240
> >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
> >>>> | > agentx/master: callback resend
> >>>> | > agentx/master: callback resend
> >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
> >>>> | > agentx/master: close 0x56207dfd5400, -1
> >>>> | > snmp_agent: removed 40 delegated request(s) for session
> >> 0x56207dfce490
> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240
> >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240
> >>>> | > snmp_agent: REMOVE session == 0x56207e165240
> >>>> | > snmp_agent: agent_session 0x56207e165240 released
> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
> >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0
> >>>> | > snmp_agent: agent_session 0x56207e1041a0 released
> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
> >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
> >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0
> >>>> | > snmp_agent: agent_session 0x56207e1656c0 released
> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40
> >>>> | > snmp_agent: REMOVE session == 0x56207e11af40
> >>>> | > snmp_agent: agent_session 0x56207e11af40 released
> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
> >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00
> >>>> | > snmp_agent: REMOVE session == 0x56207e118f00
> >>>> | > snmp_agent: agent_session 0x56207e118f00 released
> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540
> >>>> | > snmp_agent: REMOVE session == 0x56207e11b540
> >>>> | > snmp_agent: agent_session 0x56207e11b540 released
> >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
> >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
> >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00
> >>>> | > snmp_agent: agent_session 0x56207e11bd00 released
> >>>> | > agentx/master: Continue removing delegated subsession reqests
> >>>> | > agentx/master: close transport
> >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400
> >>>> | > agentx/master: response too late on session 0x56207dfd5400
> >>>> | > agentx/master: response too late on session 0x56207dfd5400
> >>>> | > double free or corruption (fasttop)
> >>>> | > Aborted (core dumped)
> >>>> | >
> >>>> | >
> >>>> | > What's interesting, when I run it with -DALL it pass (at least for
> >>>> several
> >>>> | > rounds).
> >>>> | > It looks like some strange race condition.
> >>>> | >
> >>>> | > Regards
> >>>> | >
> >>>> | > Josef Ridky
> >>>> | > Software Engineer
> >>>> | > Core Services Team
> >>>> | > Red Hat Czech, s.r.o.
> >>>> | >
> >>>> | > ----- Original Message -----
> >>>> | > | From: "Anders Wallin" <walli...@gmail.com>
> >>>> | > | To: "Josef Ridky" <jri...@redhat.com>
> >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
> >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
> >>>> | > | Subject: Re: Core dump with net-snmp-5.8
> >>>> | > |
> >>>> | > | Hi Josef,
> >>>> | > |
> >>>> | > | I think it's the same issue as
> >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/
> >>>> | > | (where I also posted the solution)
> >>>> | > | Regards
> >>>> | > | Anders Wallin
> >>>> | > |
> >>>> | > |
> >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com>
> >>>> wrote:
> >>>> | > |
> >>>> | > | > Hi,
> >>>> | > | >
> >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
> >>>> connected to
> >>>> | > the
> >>>> | > | > bug report [1].
> >>>> | > | >
> >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon
> >> will
> >>>> crash
> >>>> | > | > with malloc(): smallbin double linked list corrupted or double
> >>>> free()
> >>>> | > issue
> >>>> | > | > and dumps core (see bellow).
> >>>> | > | > From log file, I can identified one issue with "Unknown
> >> operation".
> >>>> | > | >
> >>>> | > | > This issue is in the agentx_got_response function
> >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented
> action
> >>>> for
> >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
> >>>> | > | > include/net-snmp/library/snmp_api.h).
> >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is
> shown
> >> in
> >>>> log
> >>>> | > | > file.
> >>>> | > | >
> >>>> | > | > /var/log/messages
> >>>> | > | > -------------------------------
> >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in
> >>>> | > | > agentx_got_response
> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in
> >>>> | > | > agentx_got_response
> >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin
> >> double
> >>>> | > linked
> >>>> | > | > list corrupted
> >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core
> Dump
> >>>> (PID
> >>>> | > | > 13652/UID 0).
> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main
> >> process
> >>>> | > exited,
> >>>> | > | > code=dumped, status=6/ABRT
> >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed
> with
> >>>> result
> >>>> | > | > 'core-dump'.
> >>>> | > | > -------------------------------
> >>>> | > | >
> >>>> | > | > The "Unknown operation" callback is caused by newly added
> piece
> >> of
> >>>> | > code in
> >>>> | > | > snmplib/snmp_api.c:
> >>>> | > | >
> >>>> | > | >  static int
> >>>> | > | >  snmp_resend_request(struct session_list *slp,
> >> netsnmp_request_list
> >>>> | > *rp,
> >>>> | > | >  int incr_retries)
> >>>> | > | >  {
> >>>> | > | >
> >>>> | > | > ...
> >>>> | > | >
> >>>> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
> >>>> | > | >          tv.tv_usec %= 1000000L;
> >>>> | > | >          rp->expireM = tv;
> >>>> | > | > +        if (rp->callback)
> >>>> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
> >>>> | > | > +                         rp->pdu->reqid, rp->pdu,
> rp->cb_data);
> >>>> | > | >      }
> >>>> | > | >      return 0;
> >>>> | > | >  }
> >>>> | > | >
> >>>> | > | >
> >>>> | > | > When I tried to remove it, it just stop complaining about
> >>>> operation 6,
> >>>> | > but
> >>>> | > | > the core dump is still present.
> >>>> | > | >
> >>>> | > | > May I ask you for help with this issue? Do you have any idea,
> >> what
> >>>> | > causing
> >>>> | > | > this issue in 5.8 and how to fix it?
> >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit
> [2],
> >>>> but it
> >>>> | > | > looks like something other has changed and this issue is
> current
> >>>> again.
> >>>> | > | >
> >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
> >>>> | > | > [2]
> >>>> | > | >
> >>>> | >
> >>>>
> >>
> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
> >>>> | > | >
> >>>> | > | > Regards
> >>>> | > | >
> >>>> | > | > Josef Ridky
> >>>> | > | > Software Engineer
> >>>> | > | > Core Services Team
> >>>> | > | > Red Hat Czech, s.r.o.
> >>>> | > | >
> >>>> | > | >
> >>>> | > | >
> >>>> | > | > _______________________________________________
> >>>> | > | > Net-snmp-coders mailing list
> >>>> | > | > Net-snmp-coders@lists.sourceforge.net
> >>>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
> >>>> | > | >
> >>>> | > |
> >>>> | >
> >>>> |
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Net-snmp-coders mailing list
> >>> Net-snmp-coders@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
> >>>
> >>
> >
>
_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to