Hi folks,

thanks for your solution, I have tested it internally and all works as expected.
I hope, this will be soon part of net-snmp-5.8.

Regards

Josef Ridky
Software Engineer
Core Services Team
Red Hat Czech, s.r.o.

----- Original Message -----
| From: "Anders Wallin" <walli...@gmail.com>
| To: "Masayoshi Mizuma" <msys.miz...@gmail.com>
| Cc: "Josef Ridky" <jri...@redhat.com>, "Net-SNMP Coders" 
<net-snmp-coders@lists.sourceforge.net>
| Sent: Tuesday, April 9, 2019 12:54:09 PM
| Subject: Re: Core dump with net-snmp-5.8
| 
| Now it works fine!
| 
| thx
| Anders Wallin
| 
| 
| On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com>
| wrote:
| 
| > Hi Anders,
| >
| > Thank you for your feedback!
| > I attach the v2 patch. Could you try it?
| >
| > On the v1 patch, I missed the check for the request callback. So, the
| > request
| > gets removed even though the callback doesn't run.
| >
| > Thanks,
| > Masa
| >
| > On 4/8/19 11:06 AM, Anders Wallin wrote:
| > > Hi Masa,
| > >
| > > looks like it solves the problem reported by Josef, BUT it breaks
| > DTLSUDP.
| > > I run the tests w/o analyzing why.
| > > To reproduce the issue I did the following using net-snmp master branch,
| > > plus these patches
| > > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the
| > > sending is failed (10 minutes ago) <Masayoshi Mizuma>
| > > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders
| > Wallin>
| > > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 days
| > > ago) <Anders Wallin>
| > > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch
| > > 'V5-8-patches' (9 weeks ago) <Bart Van Assche>
| > >
| > > $ ./configure --prefix=/usr \
| > >                 --with-persistent-directory=/var/lib/net-snmp \
| > >                 --with-mib-modules='smux tlstm-mib tsm-mib
| > examples/example
| > > examples/notification' \
| > >                 --with-security-modules="tsm" \
| > >                 --with-transports="TLSTCP DTLSUDP" \
| > >                 --enable-shared \
| > >                 --with-defaults \
| > >                 --enable-ipv6 \
| > >                 --with-cflags="-g -O2" \
| > >                 --without-elf
| > >
| > > $ make install
| > > $ cd testing
| > > $ ./RUNFULLTESTS -g tls
| > > DTLS-UDP user certificate tests .......................... 41/?
| > >  This hangs forever in "41" with snmpd.log saying....
| > > ......
| > > 2019-04-08 16:29:11
| > > 2019-04-08 16:29:11
| > > Received 0 byte packet from DTLSUDP: unknown
| > > 2019-04-08 16:29:11
| > > 2019-04-08 16:29:13
| > > Received 0 byte packet from DTLSUDP: unknown
| > > 2019-04-08 16:29:13
| > > 2019-04-08 16:29:15
| > > Received 0 byte packet from DTLSUDP: unknown
| > > 2019-04-08 16:29:15
| > > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170
| > > depth=0 err=18:self signed certificate
| > > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
| > > 2019-04-08 16:29:15  TLS error: SSL_read: rc=-1, sslerror = 1
| > > (SSL_ERROR_SSL)
| > > 2019-04-08 16:29:15  TLS Error: certificate verify failed
| > > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ----
| > > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
| > > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5
| > > (SSL_ERROR_SYSCALL): system_error=0 (Success)
| > > 2019-04-08 16:29:15 TLS Error: (null)
| > > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
| > > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
| > > (SSL_ERROR_SYSCALL): system_error=0 (Success)
| > > 2019-04-08 16:29:16 TLS Error: (null)
| > > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
| > > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
| > > (SSL_ERROR_SYSCALL): system_error=0 (Success)
| > > 2019-04-08 16:29:16 TLS Error: (null)
| > >
| > > With the fix suggested på Josef I don't see the DTLSUDP problem, but
| > maybe
| > > there are other problems.
| > >
| > > Regards
| > > Anders Wallin
| > >
| > > PS. thx for adding commit info to a420d87d3, I updated the patch with
| > your
| > > commit comments
| > >
| > >
| > > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com>
| > > wrote:
| > >
| > >> Hi Josef,
| > >>
| > >> I attach two patches to fix the memory inconsistency if the request is
| > >> resend and timed out.
| > >> Could you try them?
| > >>
| > >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch
| > >>
| > >>   This patch was posted by Anders, and I tried to add the description.
| > >>   This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback.
| > >>
| > >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch
| > >>
| > >>   This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED
| > >>   and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed,
| > >>   then remove the request from the internal session.
| > >>
| > >> Thanks,
| > >> Masa
| > >>
| > >> On 4/3/19 9:34 AM, Anders Wallin wrote:
| > >>> The introduction of that code fixes another issue;
| > >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020
| > >>> Author: Bill Fenner <fen...@gmail.com>
| > >>> Date:   Wed Dec 20 21:52:10 2017 +0000
| > >>>
| > >>>     NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3
| > >>> agent
| > >>>
| > >>>     With the patch in 1214, the snmp_api code assumed that if magic was
| > >>>     set, it was the "struct synch-state" from snmp_client.  Of course,
| > >>>     magic belongs to the caller, and the perl library uses it
| > >> differently,
| > >>>     so reaching into it is verboten.  Introduce a new callback (that
| > >>>     was already introduced in 5.8) to report this "retries exceeded"
| > >>>     state, and use it in snmp_client."
| > >>>
| > >>> I think the problem is really about shutting down the agentx connection
| > >>> when one(1) response is to late. I have
| > >>> done 2 patches (one that only write a better log message and one that
| > >>> removes the "bad" code.
| > >>> With these patches I don't get any crash. I think that 5.7.3 has this
| > >> issue
| > >>> as well, but it can not be crashed with the agentofdead code
| > >>>
| > >>> Can you please try this?
| > >>>
| > >>> Regards
| > >>> Anders Wallin
| > >>>
| > >>>
| > >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote:
| > >>>
| > >>>> Hi,
| > >>>>
| > >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that
| > >>>> following callbacks in snmplib/snmp_api.c causes the core dump issue:
| > >>>>
| > >>>> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
| > >>>> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
| > >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
| > >>>>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
| > >>>>          sp->s_errno = errno;
| > >>>>          snmp_set_detail(strerror(errno));
| > >>>> -        if (rp->callback)
| > >>>> +/*        if (rp->callback)
| > >>>>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
| > >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
| > >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
| > >>>>          return -1;
| > >>>>      } else {
| > >>>>          netsnmp_get_monotonic_clock(&now);
| > >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
| > >>>>          tv.tv_sec += tv.tv_usec / 1000000L;
| > >>>>          tv.tv_usec %= 1000000L;
| > >>>>          rp->expireM = tv;
| > >>>> -        if (rp->callback)
| > >>>> +/*        if (rp->callback)
| > >>>>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
| > >>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
| > >>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
| > >>>>      }
| > >>>>      return 0;
| > >>>>  }
| > >>>>
| > >>>> Without them, all works as expected.
| > >>>>
| > >>>> Josef Ridky
| > >>>> Software Engineer
| > >>>> Core Services Team
| > >>>> Red Hat Czech, s.r.o.
| > >>>>
| > >>>> ----- Original Message -----
| > >>>> | From: "Anders Wallin" <walli...@gmail.com>
| > >>>> | To: "Josef Ridky" <jri...@redhat.com>
| > >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
| > >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM
| > >>>> | Subject: Re: Core dump with net-snmp-5.8
| > >>>> |
| > >>>> | Hi Josef,
| > >>>> | I can reproduce the issue using the master branch, I will take a
| > look
| > >> at
| > >>>> it
| > >>>> | later tonight or tomorrow
| > >>>> |
| > >>>> | Regards
| > >>>> | Anders Wallin
| > >>>> |
| > >>>> |
| > >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com>
| > wrote:
| > >>>> |
| > >>>> | > Hi,
| > >>>> | >
| > >>>> | > thanks for your patch. Unfortunately, even when I have applied it,
| > >> it
| > >>>> | > still ends with core dump due of 'double free or corruption
| > >> (fasttop)'
| > >>>> | >
| > >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
| > >>>> | >
| > >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
| > >>>> | > snmp_agent: delegate session == 0x56207e165240
| > >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
| > >>>> | > agentx/master: callback resend
| > >>>> | > agentx/master: callback resend
| > >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
| > >>>> | > agentx/master: close 0x56207dfd5400, -1
| > >>>> | > snmp_agent: removed 40 delegated request(s) for session
| > >> 0x56207dfce490
| > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240
| > >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240
| > >>>> | > snmp_agent: REMOVE session == 0x56207e165240
| > >>>> | > snmp_agent: agent_session 0x56207e165240 released
| > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
| > >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
| > >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0
| > >>>> | > snmp_agent: agent_session 0x56207e1041a0 released
| > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
| > >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
| > >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0
| > >>>> | > snmp_agent: agent_session 0x56207e1656c0 released
| > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
| > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40
| > >>>> | > snmp_agent: REMOVE session == 0x56207e11af40
| > >>>> | > snmp_agent: agent_session 0x56207e11af40 released
| > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
| > >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00
| > >>>> | > snmp_agent: REMOVE session == 0x56207e118f00
| > >>>> | > snmp_agent: agent_session 0x56207e118f00 released
| > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
| > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540
| > >>>> | > snmp_agent: REMOVE session == 0x56207e11b540
| > >>>> | > snmp_agent: agent_session 0x56207e11b540 released
| > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
| > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
| > >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00
| > >>>> | > snmp_agent: agent_session 0x56207e11bd00 released
| > >>>> | > agentx/master: Continue removing delegated subsession reqests
| > >>>> | > agentx/master: close transport
| > >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400
| > >>>> | > agentx/master: response too late on session 0x56207dfd5400
| > >>>> | > agentx/master: response too late on session 0x56207dfd5400
| > >>>> | > double free or corruption (fasttop)
| > >>>> | > Aborted (core dumped)
| > >>>> | >
| > >>>> | >
| > >>>> | > What's interesting, when I run it with -DALL it pass (at least for
| > >>>> several
| > >>>> | > rounds).
| > >>>> | > It looks like some strange race condition.
| > >>>> | >
| > >>>> | > Regards
| > >>>> | >
| > >>>> | > Josef Ridky
| > >>>> | > Software Engineer
| > >>>> | > Core Services Team
| > >>>> | > Red Hat Czech, s.r.o.
| > >>>> | >
| > >>>> | > ----- Original Message -----
| > >>>> | > | From: "Anders Wallin" <walli...@gmail.com>
| > >>>> | > | To: "Josef Ridky" <jri...@redhat.com>
| > >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
| > >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
| > >>>> | > | Subject: Re: Core dump with net-snmp-5.8
| > >>>> | > |
| > >>>> | > | Hi Josef,
| > >>>> | > |
| > >>>> | > | I think it's the same issue as
| > >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/
| > >>>> | > | (where I also posted the solution)
| > >>>> | > | Regards
| > >>>> | > | Anders Wallin
| > >>>> | > |
| > >>>> | > |
| > >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com>
| > >>>> wrote:
| > >>>> | > |
| > >>>> | > | > Hi,
| > >>>> | > | >
| > >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
| > >>>> connected to
| > >>>> | > the
| > >>>> | > | > bug report [1].
| > >>>> | > | >
| > >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon
| > >> will
| > >>>> crash
| > >>>> | > | > with malloc(): smallbin double linked list corrupted or double
| > >>>> free()
| > >>>> | > issue
| > >>>> | > | > and dumps core (see bellow).
| > >>>> | > | > From log file, I can identified one issue with "Unknown
| > >> operation".
| > >>>> | > | >
| > >>>> | > | > This issue is in the agentx_got_response function
| > >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented
| > action
| > >>>> for
| > >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
| > >>>> | > | > include/net-snmp/library/snmp_api.h).
| > >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is
| > shown
| > >> in
| > >>>> log
| > >>>> | > | > file.
| > >>>> | > | >
| > >>>> | > | > /var/log/messages
| > >>>> | > | > -------------------------------
| > >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in
| > >>>> | > | > agentx_got_response
| > >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in
| > >>>> | > | > agentx_got_response
| > >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin
| > >> double
| > >>>> | > linked
| > >>>> | > | > list corrupted
| > >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core
| > Dump
| > >>>> (PID
| > >>>> | > | > 13652/UID 0).
| > >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main
| > >> process
| > >>>> | > exited,
| > >>>> | > | > code=dumped, status=6/ABRT
| > >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed
| > with
| > >>>> result
| > >>>> | > | > 'core-dump'.
| > >>>> | > | > -------------------------------
| > >>>> | > | >
| > >>>> | > | > The "Unknown operation" callback is caused by newly added
| > piece
| > >> of
| > >>>> | > code in
| > >>>> | > | > snmplib/snmp_api.c:
| > >>>> | > | >
| > >>>> | > | >  static int
| > >>>> | > | >  snmp_resend_request(struct session_list *slp,
| > >> netsnmp_request_list
| > >>>> | > *rp,
| > >>>> | > | >  int incr_retries)
| > >>>> | > | >  {
| > >>>> | > | >
| > >>>> | > | > ...
| > >>>> | > | >
| > >>>> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
| > >>>> | > | >          tv.tv_usec %= 1000000L;
| > >>>> | > | >          rp->expireM = tv;
| > >>>> | > | > +        if (rp->callback)
| > >>>> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
| > >>>> | > | > +                         rp->pdu->reqid, rp->pdu,
| > rp->cb_data);
| > >>>> | > | >      }
| > >>>> | > | >      return 0;
| > >>>> | > | >  }
| > >>>> | > | >
| > >>>> | > | >
| > >>>> | > | > When I tried to remove it, it just stop complaining about
| > >>>> operation 6,
| > >>>> | > but
| > >>>> | > | > the core dump is still present.
| > >>>> | > | >
| > >>>> | > | > May I ask you for help with this issue? Do you have any idea,
| > >> what
| > >>>> | > causing
| > >>>> | > | > this issue in 5.8 and how to fix it?
| > >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit
| > [2],
| > >>>> but it
| > >>>> | > | > looks like something other has changed and this issue is
| > current
| > >>>> again.
| > >>>> | > | >
| > >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
| > >>>> | > | > [2]
| > >>>> | > | >
| > >>>> | >
| > >>>>
| > >>
| > 
https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
| > >>>> | > | >
| > >>>> | > | > Regards
| > >>>> | > | >
| > >>>> | > | > Josef Ridky
| > >>>> | > | > Software Engineer
| > >>>> | > | > Core Services Team
| > >>>> | > | > Red Hat Czech, s.r.o.
| > >>>> | > | >
| > >>>> | > | >
| > >>>> | > | >
| > >>>> | > | > _______________________________________________
| > >>>> | > | > Net-snmp-coders mailing list
| > >>>> | > | > Net-snmp-coders@lists.sourceforge.net
| > >>>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
| > >>>> | > | >
| > >>>> | > |
| > >>>> | >
| > >>>> |
| > >>>>
| > >>>
| > >>>
| > >>>
| > >>> _______________________________________________
| > >>> Net-snmp-coders mailing list
| > >>> Net-snmp-coders@lists.sourceforge.net
| > >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
| > >>>
| > >>
| > >
| >
| 


_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to