The introduction of that code fixes another issue;
"commit 56c30b11f3616ea4f0c38a21e08e78f050096020
Author: Bill Fenner <fen...@gmail.com>
Date:   Wed Dec 20 21:52:10 2017 +0000

    NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3
agent

    With the patch in 1214, the snmp_api code assumed that if magic was
    set, it was the "struct synch-state" from snmp_client.  Of course,
    magic belongs to the caller, and the perl library uses it differently,
    so reaching into it is verboten.  Introduce a new callback (that
    was already introduced in 5.8) to report this "retries exceeded"
    state, and use it in snmp_client."

I think the problem is really about shutting down the agentx connection
when one(1) response is to late. I have
done 2 patches (one that only write a better log message and one that
removes the "bad" code.
With these patches I don't get any crash. I think that 5.7.3 has this issue
as well, but it can not be crashed with the agentofdead code

Can you please try this?

Regards
Anders Wallin


On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote:

> Hi,
>
> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that
> following callbacks in snmplib/snmp_api.c causes the core dump issue:
>
> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
>          sp->s_errno = errno;
>          snmp_set_detail(strerror(errno));
> -        if (rp->callback)
> +/*        if (rp->callback)
>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>          return -1;
>      } else {
>          netsnmp_get_monotonic_clock(&now);
> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
>          tv.tv_sec += tv.tv_usec / 1000000L;
>          tv.tv_usec %= 1000000L;
>          rp->expireM = tv;
> -        if (rp->callback)
> +/*        if (rp->callback)
>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>      }
>      return 0;
>  }
>
> Without them, all works as expected.
>
> Josef Ridky
> Software Engineer
> Core Services Team
> Red Hat Czech, s.r.o.
>
> ----- Original Message -----
> | From: "Anders Wallin" <walli...@gmail.com>
> | To: "Josef Ridky" <jri...@redhat.com>
> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
> | Sent: Tuesday, April 2, 2019 6:27:54 PM
> | Subject: Re: Core dump with net-snmp-5.8
> |
> | Hi Josef,
> | I can reproduce the issue using the master branch, I will take a look at
> it
> | later tonight or tomorrow
> |
> | Regards
> | Anders Wallin
> |
> |
> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> wrote:
> |
> | > Hi,
> | >
> | > thanks for your patch. Unfortunately, even when I have applied it, it
> | > still ends with core dump due of 'double free or corruption (fasttop)'
> | >
> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
> | >
> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
> | > snmp_agent: delegate session == 0x56207e165240
> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
> | > agentx/master: callback resend
> | > agentx/master: callback resend
> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
> | > agentx/master: close 0x56207dfd5400, -1
> | > snmp_agent: removed 40 delegated request(s) for session 0x56207dfce490
> | > snmp_agent: processing delegated request, asp = 0x56207e165240
> | > snmp_agent: canceling next walk for asp 0x56207e165240
> | > snmp_agent: REMOVE session == 0x56207e165240
> | > snmp_agent: agent_session 0x56207e165240 released
> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
> | > snmp_agent: REMOVE session == 0x56207e1041a0
> | > snmp_agent: agent_session 0x56207e1041a0 released
> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
> | > snmp_agent: REMOVE session == 0x56207e1656c0
> | > snmp_agent: agent_session 0x56207e1656c0 released
> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
> | > snmp_agent: canceling next walk for asp 0x56207e11af40
> | > snmp_agent: REMOVE session == 0x56207e11af40
> | > snmp_agent: agent_session 0x56207e11af40 released
> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
> | > snmp_agent: canceling next walk for asp 0x56207e118f00
> | > snmp_agent: REMOVE session == 0x56207e118f00
> | > snmp_agent: agent_session 0x56207e118f00 released
> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
> | > snmp_agent: canceling next walk for asp 0x56207e11b540
> | > snmp_agent: REMOVE session == 0x56207e11b540
> | > snmp_agent: agent_session 0x56207e11b540 released
> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
> | > snmp_agent: REMOVE session == 0x56207e11bd00
> | > snmp_agent: agent_session 0x56207e11bd00 released
> | > agentx/master: Continue removing delegated subsession reqests
> | > agentx/master: close transport
> | > snmp_agent: REMOVE session == 0x56207dfd5400
> | > agentx/master: response too late on session 0x56207dfd5400
> | > agentx/master: response too late on session 0x56207dfd5400
> | > double free or corruption (fasttop)
> | > Aborted (core dumped)
> | >
> | >
> | > What's interesting, when I run it with -DALL it pass (at least for
> several
> | > rounds).
> | > It looks like some strange race condition.
> | >
> | > Regards
> | >
> | > Josef Ridky
> | > Software Engineer
> | > Core Services Team
> | > Red Hat Czech, s.r.o.
> | >
> | > ----- Original Message -----
> | > | From: "Anders Wallin" <walli...@gmail.com>
> | > | To: "Josef Ridky" <jri...@redhat.com>
> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
> | > | Subject: Re: Core dump with net-snmp-5.8
> | > |
> | > | Hi Josef,
> | > |
> | > | I think it's the same issue as
> | > https://sourceforge.net/p/net-snmp/bugs/2914/
> | > | (where I also posted the solution)
> | > | Regards
> | > | Anders Wallin
> | > |
> | > |
> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com>
> wrote:
> | > |
> | > | > Hi,
> | > | >
> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
> connected to
> | > the
> | > | > bug report [1].
> | > | >
> | > | > When I tried to run agentofdeath test from [1], snmpd daemon will
> crash
> | > | > with malloc(): smallbin double linked list corrupted or double
> free()
> | > issue
> | > | > and dumps core (see bellow).
> | > | > From log file, I can identified one issue with "Unknown operation".
> | > | >
> | > | > This issue is in the agentx_got_response function
> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented action
> for
> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
> | > | > include/net-snmp/library/snmp_api.h).
> | > | > As result "Unknown operation 6 in agentx_got_response" is shown in
> log
> | > | > file.
> | > | >
> | > | > /var/log/messages
> | > | > -------------------------------
> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in
> | > | > agentx_got_response
> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in
> | > | > agentx_got_response
> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin double
> | > linked
> | > | > list corrupted
> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core Dump
> (PID
> | > | > 13652/UID 0).
> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main process
> | > exited,
> | > | > code=dumped, status=6/ABRT
> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed with
> result
> | > | > 'core-dump'.
> | > | > -------------------------------
> | > | >
> | > | > The "Unknown operation" callback is caused by newly added piece of
> | > code in
> | > | > snmplib/snmp_api.c:
> | > | >
> | > | >  static int
> | > | >  snmp_resend_request(struct session_list *slp, netsnmp_request_list
> | > *rp,
> | > | >  int incr_retries)
> | > | >  {
> | > | >
> | > | > ...
> | > | >
> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
> | > | >          tv.tv_usec %= 1000000L;
> | > | >          rp->expireM = tv;
> | > | > +        if (rp->callback)
> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
> | > | > +                         rp->pdu->reqid, rp->pdu, rp->cb_data);
> | > | >      }
> | > | >      return 0;
> | > | >  }
> | > | >
> | > | >
> | > | > When I tried to remove it, it just stop complaining about
> operation 6,
> | > but
> | > | > the core dump is still present.
> | > | >
> | > | > May I ask you for help with this issue? Do you have any idea, what
> | > causing
> | > | > this issue in 5.8 and how to fix it?
> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit [2],
> but it
> | > | > looks like something other has changed and this issue is current
> again.
> | > | >
> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
> | > | > [2]
> | > | >
> | >
> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
> | > | >
> | > | > Regards
> | > | >
> | > | > Josef Ridky
> | > | > Software Engineer
> | > | > Core Services Team
> | > | > Red Hat Czech, s.r.o.
> | > | >
> | > | >
> | > | >
> | > | > _______________________________________________
> | > | > Net-snmp-coders mailing list
> | > | > Net-snmp-coders@lists.sourceforge.net
> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
> | > | >
> | > |
> | >
> |
>
From fa3f1e5fc9407a6e2eff233e5eab39d1f069a801 Mon Sep 17 00:00:00 2001
Message-Id: <fa3f1e5fc9407a6e2eff233e5eab39d1f069a801.1554297562.git.walli...@gmail.com>
From: Anders Wallin <walli...@gmail.com>
Date: Wed, 3 Apr 2019 09:42:29 +0200
Subject: [PATCH 1/2] agentx: logging to late responses

Signed-off-by: Anders Wallin <walli...@gmail.com>
---
 agent/mibgroup/agentx/master.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/agent/mibgroup/agentx/master.c b/agent/mibgroup/agentx/master.c
index 99c4123ed..6a561a32d 100644
--- a/agent/mibgroup/agentx/master.c
+++ b/agent/mibgroup/agentx/master.c
@@ -216,8 +216,9 @@ agentx_got_response(int operation,
 
     cache = netsnmp_handler_check_cache(cache);
     if (!cache) {
-        DEBUGMSGTL(("agentx/master", "response too late on session %8p\n",
-                    session));
+        snmp_log(LOG_WARNING,
+                 "agentx_got_response, response too late on session %p, magic=%p\n",
+                 session, magic);
         /* response is too late, free the cache */
         if (magic)
             netsnmp_free_delegated_cache((netsnmp_delegated_cache*) magic);
-- 
2.21.0

From 3320b84eec269037a10af369e2a1290d179dc776 Mon Sep 17 00:00:00 2001
Message-Id: <3320b84eec269037a10af369e2a1290d179dc776.1554297562.git.walli...@gmail.com>
In-Reply-To: <fa3f1e5fc9407a6e2eff233e5eab39d1f069a801.1554297562.git.walli...@gmail.com>
References: <fa3f1e5fc9407a6e2eff233e5eab39d1f069a801.1554297562.git.walli...@gmail.com>
From: Anders Wallin <walli...@gmail.com>
Date: Wed, 3 Apr 2019 09:38:37 +0200
Subject: [PATCH 2/2] agentx: do not shut down all sessions when one session
 times out

Signed-off-by: Anders Wallin <walli...@gmail.com>
---
 agent/mibgroup/agentx/master.c | 21 ---------------------
 1 file changed, 21 deletions(-)

diff --git a/agent/mibgroup/agentx/master.c b/agent/mibgroup/agentx/master.c
index 6a561a32d..5b00f2d5a 100644
--- a/agent/mibgroup/agentx/master.c
+++ b/agent/mibgroup/agentx/master.c
@@ -228,7 +228,6 @@ agentx_got_response(int operation,
 
     switch (operation) {
     case NETSNMP_CALLBACK_OP_TIMED_OUT:{
-            void           *s = snmp_sess_pointer(session);
             DEBUGMSGTL(("agentx/master", "timeout on session %8p req=0x%x\n",
                         session, (unsigned)reqid));
 
@@ -238,26 +237,6 @@ agentx_got_response(int operation,
                                       /* XXXWWW: should be index=0 */
                                       SNMP_ERR_GENERR);
 
-            /*
-             * This is a bit sledgehammer because the other sessions on this
-             * transport may be okay (e.g. some thread in the subagent has
-             * wedged, but the others are alright).  OTOH the overwhelming
-             * probability is that the whole agent has died somehow.  
-             */
-
-            if (s != NULL) {
-                netsnmp_transport *t = snmp_sess_transport(s);
-                close_agentx_session(session, -1);
-
-                if (t != NULL) {
-                    DEBUGMSGTL(("agentx/master", "close transport\n"));
-                    t->f_close(t);
-                } else {
-                    DEBUGMSGTL(("agentx/master", "NULL transport??\n"));
-                }
-            } else {
-                DEBUGMSGTL(("agentx/master", "NULL sess_pointer??\n"));
-            }
             ax_session = (netsnmp_session *) cache->localinfo;
             netsnmp_free_agent_snmp_session_by_session(ax_session, NULL);
             netsnmp_free_delegated_cache(cache);
-- 
2.21.0

_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to