Hi Anders,

Thank you for your feedback!
I attach the v2 patch. Could you try it?

On the v1 patch, I missed the check for the request callback. So, the request
gets removed even though the callback doesn't run.

Thanks,
Masa

On 4/8/19 11:06 AM, Anders Wallin wrote:
> Hi Masa,
> 
> looks like it solves the problem reported by Josef, BUT it breaks DTLSUDP.
> I run the tests w/o analyzing why.
> To reproduce the issue I did the following using net-snmp master branch,
> plus these patches
> 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the
> sending is failed (10 minutes ago) <Masayoshi Mizuma>
> 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders Wallin>
> a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 days
> ago) <Anders Wallin>
> eaad09d04 - (origin/master, origin/HEAD, master) Merge branch
> 'V5-8-patches' (9 weeks ago) <Bart Van Assche>
> 
> $ ./configure --prefix=/usr \
>                 --with-persistent-directory=/var/lib/net-snmp \
>                 --with-mib-modules='smux tlstm-mib tsm-mib examples/example
> examples/notification' \
>                 --with-security-modules="tsm" \
>                 --with-transports="TLSTCP DTLSUDP" \
>                 --enable-shared \
>                 --with-defaults \
>                 --enable-ipv6 \
>                 --with-cflags="-g -O2" \
>                 --without-elf
> 
> $ make install
> $ cd testing
> $ ./RUNFULLTESTS -g tls
> DTLS-UDP user certificate tests .......................... 41/?
>  This hangs forever in "41" with snmpd.log saying....
> ......
> 2019-04-08 16:29:11
> 2019-04-08 16:29:11
> Received 0 byte packet from DTLSUDP: unknown
> 2019-04-08 16:29:11
> 2019-04-08 16:29:13
> Received 0 byte packet from DTLSUDP: unknown
> 2019-04-08 16:29:13
> 2019-04-08 16:29:15
> Received 0 byte packet from DTLSUDP: unknown
> 2019-04-08 16:29:15
> 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170
> depth=0 err=18:self signed certificate
> 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
> 2019-04-08 16:29:15  TLS error: SSL_read: rc=-1, sslerror = 1
> (SSL_ERROR_SSL)
> 2019-04-08 16:29:15  TLS Error: certificate verify failed
> 2019-04-08 16:29:15 ---- End of OpenSSL Errors ----
> 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ----
> 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5
> (SSL_ERROR_SYSCALL): system_error=0 (Success)
> 2019-04-08 16:29:15 TLS Error: (null)
> 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
> 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
> (SSL_ERROR_SYSCALL): system_error=0 (Success)
> 2019-04-08 16:29:16 TLS Error: (null)
> 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ----
> 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5
> (SSL_ERROR_SYSCALL): system_error=0 (Success)
> 2019-04-08 16:29:16 TLS Error: (null)
> 
> With the fix suggested på Josef I don't see the DTLSUDP problem, but maybe
> there are other problems.
> 
> Regards
> Anders Wallin
> 
> PS. thx for adding commit info to a420d87d3, I updated the patch with your
> commit comments
> 
> 
> On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com>
> wrote:
> 
>> Hi Josef,
>>
>> I attach two patches to fix the memory inconsistency if the request is
>> resend and timed out.
>> Could you try them?
>>
>> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch
>>
>>   This patch was posted by Anders, and I tried to add the description.
>>   This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback.
>>
>> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch
>>
>>   This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED
>>   and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed,
>>   then remove the request from the internal session.
>>
>> Thanks,
>> Masa
>>
>> On 4/3/19 9:34 AM, Anders Wallin wrote:
>>> The introduction of that code fixes another issue;
>>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020
>>> Author: Bill Fenner <fen...@gmail.com>
>>> Date:   Wed Dec 20 21:52:10 2017 +0000
>>>
>>>     NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3
>>> agent
>>>
>>>     With the patch in 1214, the snmp_api code assumed that if magic was
>>>     set, it was the "struct synch-state" from snmp_client.  Of course,
>>>     magic belongs to the caller, and the perl library uses it
>> differently,
>>>     so reaching into it is verboten.  Introduce a new callback (that
>>>     was already introduced in 5.8) to report this "retries exceeded"
>>>     state, and use it in snmp_client."
>>>
>>> I think the problem is really about shutting down the agentx connection
>>> when one(1) response is to late. I have
>>> done 2 patches (one that only write a better log message and one that
>>> removes the "bad" code.
>>> With these patches I don't get any crash. I think that 5.7.3 has this
>> issue
>>> as well, but it can not be crashed with the agentofdead code
>>>
>>> Can you please try this?
>>>
>>> Regards
>>> Anders Wallin
>>>
>>>
>>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that
>>>> following callbacks in snmplib/snmp_api.c causes the core dump issue:
>>>>
>>>> --- old/snmplib/snmp_api.c      2019-04-03 12:13:55.126769866 +0200
>>>> +++ new/snmplib/snmp_api.c      2019-04-03 12:15:18.353420790 +0200
>>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list
>>>>          sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
>>>>          sp->s_errno = errno;
>>>>          snmp_set_detail(strerror(errno));
>>>> -        if (rp->callback)
>>>> +/*        if (rp->callback)
>>>>              rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
>>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>>>>          return -1;
>>>>      } else {
>>>>          netsnmp_get_monotonic_clock(&now);
>>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list
>>>>          tv.tv_sec += tv.tv_usec / 1000000L;
>>>>          tv.tv_usec %= 1000000L;
>>>>          rp->expireM = tv;
>>>> -        if (rp->callback)
>>>> +/*        if (rp->callback)
>>>>              rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>>>> -                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>>>> +                         rp->pdu->reqid, rp->pdu, rp->cb_data);*/
>>>>      }
>>>>      return 0;
>>>>  }
>>>>
>>>> Without them, all works as expected.
>>>>
>>>> Josef Ridky
>>>> Software Engineer
>>>> Core Services Team
>>>> Red Hat Czech, s.r.o.
>>>>
>>>> ----- Original Message -----
>>>> | From: "Anders Wallin" <walli...@gmail.com>
>>>> | To: "Josef Ridky" <jri...@redhat.com>
>>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM
>>>> | Subject: Re: Core dump with net-snmp-5.8
>>>> |
>>>> | Hi Josef,
>>>> | I can reproduce the issue using the master branch, I will take a look
>> at
>>>> it
>>>> | later tonight or tomorrow
>>>> |
>>>> | Regards
>>>> | Anders Wallin
>>>> |
>>>> |
>>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> wrote:
>>>> |
>>>> | > Hi,
>>>> | >
>>>> | > thanks for your patch. Unfortunately, even when I have applied it,
>> it
>>>> | > still ends with core dump due of 'double free or corruption
>> (fasttop)'
>>>> | >
>>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with:
>>>> | >
>>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5)
>>>> | > snmp_agent: delegate session == 0x56207e165240
>>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240
>>>> | > agentx/master: callback resend
>>>> | > agentx/master: callback resend
>>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9
>>>> | > agentx/master: close 0x56207dfd5400, -1
>>>> | > snmp_agent: removed 40 delegated request(s) for session
>> 0x56207dfce490
>>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240
>>>> | > snmp_agent: canceling next walk for asp 0x56207e165240
>>>> | > snmp_agent: REMOVE session == 0x56207e165240
>>>> | > snmp_agent: agent_session 0x56207e165240 released
>>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0
>>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0
>>>> | > snmp_agent: REMOVE session == 0x56207e1041a0
>>>> | > snmp_agent: agent_session 0x56207e1041a0 released
>>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0
>>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0
>>>> | > snmp_agent: REMOVE session == 0x56207e1656c0
>>>> | > snmp_agent: agent_session 0x56207e1656c0 released
>>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40
>>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40
>>>> | > snmp_agent: REMOVE session == 0x56207e11af40
>>>> | > snmp_agent: agent_session 0x56207e11af40 released
>>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00
>>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00
>>>> | > snmp_agent: REMOVE session == 0x56207e118f00
>>>> | > snmp_agent: agent_session 0x56207e118f00 released
>>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540
>>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540
>>>> | > snmp_agent: REMOVE session == 0x56207e11b540
>>>> | > snmp_agent: agent_session 0x56207e11b540 released
>>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00
>>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00
>>>> | > snmp_agent: REMOVE session == 0x56207e11bd00
>>>> | > snmp_agent: agent_session 0x56207e11bd00 released
>>>> | > agentx/master: Continue removing delegated subsession reqests
>>>> | > agentx/master: close transport
>>>> | > snmp_agent: REMOVE session == 0x56207dfd5400
>>>> | > agentx/master: response too late on session 0x56207dfd5400
>>>> | > agentx/master: response too late on session 0x56207dfd5400
>>>> | > double free or corruption (fasttop)
>>>> | > Aborted (core dumped)
>>>> | >
>>>> | >
>>>> | > What's interesting, when I run it with -DALL it pass (at least for
>>>> several
>>>> | > rounds).
>>>> | > It looks like some strange race condition.
>>>> | >
>>>> | > Regards
>>>> | >
>>>> | > Josef Ridky
>>>> | > Software Engineer
>>>> | > Core Services Team
>>>> | > Red Hat Czech, s.r.o.
>>>> | >
>>>> | > ----- Original Message -----
>>>> | > | From: "Anders Wallin" <walli...@gmail.com>
>>>> | > | To: "Josef Ridky" <jri...@redhat.com>
>>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net>
>>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM
>>>> | > | Subject: Re: Core dump with net-snmp-5.8
>>>> | > |
>>>> | > | Hi Josef,
>>>> | > |
>>>> | > | I think it's the same issue as
>>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/
>>>> | > | (where I also posted the solution)
>>>> | > | Regards
>>>> | > | Anders Wallin
>>>> | > |
>>>> | > |
>>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com>
>>>> wrote:
>>>> | > |
>>>> | > | > Hi,
>>>> | > | >
>>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is
>>>> connected to
>>>> | > the
>>>> | > | > bug report [1].
>>>> | > | >
>>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon
>> will
>>>> crash
>>>> | > | > with malloc(): smallbin double linked list corrupted or double
>>>> free()
>>>> | > issue
>>>> | > | > and dumps core (see bellow).
>>>> | > | > From log file, I can identified one issue with "Unknown
>> operation".
>>>> | > | >
>>>> | > | > This issue is in the agentx_got_response function
>>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented action
>>>> for
>>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in
>>>> | > | > include/net-snmp/library/snmp_api.h).
>>>> | > | > As result "Unknown operation 6 in agentx_got_response" is shown
>> in
>>>> log
>>>> | > | > file.
>>>> | > | >
>>>> | > | > /var/log/messages
>>>> | > | > -------------------------------
>>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in
>>>> | > | > agentx_got_response
>>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in
>>>> | > | > agentx_got_response
>>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin
>> double
>>>> | > linked
>>>> | > | > list corrupted
>>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core Dump
>>>> (PID
>>>> | > | > 13652/UID 0).
>>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main
>> process
>>>> | > exited,
>>>> | > | > code=dumped, status=6/ABRT
>>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed with
>>>> result
>>>> | > | > 'core-dump'.
>>>> | > | > -------------------------------
>>>> | > | >
>>>> | > | > The "Unknown operation" callback is caused by newly added piece
>> of
>>>> | > code in
>>>> | > | > snmplib/snmp_api.c:
>>>> | > | >
>>>> | > | >  static int
>>>> | > | >  snmp_resend_request(struct session_list *slp,
>> netsnmp_request_list
>>>> | > *rp,
>>>> | > | >  int incr_retries)
>>>> | > | >  {
>>>> | > | >
>>>> | > | > ...
>>>> | > | >
>>>> | > | >          tv.tv_sec += tv.tv_usec / 1000000L;
>>>> | > | >          tv.tv_usec %= 1000000L;
>>>> | > | >          rp->expireM = tv;
>>>> | > | > +        if (rp->callback)
>>>> | > | > +            rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp,
>>>> | > | > +                         rp->pdu->reqid, rp->pdu, rp->cb_data);
>>>> | > | >      }
>>>> | > | >      return 0;
>>>> | > | >  }
>>>> | > | >
>>>> | > | >
>>>> | > | > When I tried to remove it, it just stop complaining about
>>>> operation 6,
>>>> | > but
>>>> | > | > the core dump is still present.
>>>> | > | >
>>>> | > | > May I ask you for help with this issue? Do you have any idea,
>> what
>>>> | > causing
>>>> | > | > this issue in 5.8 and how to fix it?
>>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit [2],
>>>> but it
>>>> | > | > looks like something other has changed and this issue is current
>>>> again.
>>>> | > | >
>>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/
>>>> | > | > [2]
>>>> | > | >
>>>> | >
>>>>
>> https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df
>>>> | > | >
>>>> | > | > Regards
>>>> | > | >
>>>> | > | > Josef Ridky
>>>> | > | > Software Engineer
>>>> | > | > Core Services Team
>>>> | > | > Red Hat Czech, s.r.o.
>>>> | > | >
>>>> | > | >
>>>> | > | >
>>>> | > | > _______________________________________________
>>>> | > | > Net-snmp-coders mailing list
>>>> | > | > Net-snmp-coders@lists.sourceforge.net
>>>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>>>> | > | >
>>>> | > |
>>>> | >
>>>> |
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Net-snmp-coders mailing list
>>> Net-snmp-coders@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders
>>>
>>
> 
From: Masayoshi Mizuma <m.miz...@jp.fujitsu.com>
Date: Mon, 8 Apr 2019 19:56:04 -0400
Subject: [PATCH v2] snmplib/snmp_api: Remove the request on the session when the
 sending is failed

snmpd is terminated abnormally due to an invalid memory access after
the sending of a request is failed.

The time out callback for the failed request is executed when the
session is closing because the request remains in the internal session.
The cleanup for the request is executed on the
callback(NETSNMP_CALLBACK_OP_SEND_FAILED,) and also on the time out
callback(NETSNMP_CALLBACK_OP_TIMED_OUT,), so the wrong memory access
happens.

Remove the failed request from the internal session after the callback
for the failed request is done.

Signed-off-by: Masayoshi Mizuma <m.miz...@jp.fujitsu.com>
Reported-by: Shogo Matsumoto <shogo.matsum...@jp.fujitsu.com>
---
 snmplib/snmp_api.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/snmplib/snmp_api.c b/snmplib/snmp_api.c
index 554767a83..76332d9e1 100644
--- a/snmplib/snmp_api.c
+++ b/snmplib/snmp_api.c
@@ -352,6 +352,7 @@ static int      snmpv3_build(u_char ** pkt, size_t * pkt_len,
                              netsnmp_pdu *pdu);
 static int      snmp_parse_version(u_char *, size_t);
 static int      snmp_resend_request(struct session_list *slp,
+                                    netsnmp_request_list *orp,
                                     netsnmp_request_list *rp,
                                     int incr_retries);
 static void     register_default_handlers(void);
@@ -5717,7 +5718,7 @@ _sess_process_packet_handle_pdu(void *sessp, netsnmp_session * sp,
 	     * * inifinite resend                      
 	     */
 	    if (rp->retries <= sp->retries) {
-	      snmp_resend_request(slp, rp, TRUE);
+	      snmp_resend_request(slp, orp, rp, TRUE);
 	      break;
 	    } else {
 	      /* We're done with retries, so no longer waiting for a response */
@@ -6662,9 +6663,22 @@ snmp_timeout(void)
     snmp_res_unlock(MT_LIBRARY_ID, MT_LIB_SESSION);
 }
 
+static void
+remove_request(struct snmp_internal_session *isp,
+               netsnmp_request_list *orp, netsnmp_request_list *rp)
+{
+    if (orp)
+        orp->next_request = rp->next_request;
+    else
+        isp->requests = rp->next_request;
+    if (isp->requestsEnd == rp)
+        isp->requestsEnd = orp;
+    snmp_free_pdu(rp->pdu);
+}
+
 static int
-snmp_resend_request(struct session_list *slp, netsnmp_request_list *rp,
-                    int incr_retries)
+snmp_resend_request(struct session_list *slp, netsnmp_request_list *orp,
+                    netsnmp_request_list *rp, int incr_retries)
 {
     struct snmp_internal_session *isp;
     netsnmp_session *sp;
@@ -6731,9 +6745,11 @@ snmp_resend_request(struct session_list *slp, netsnmp_request_list *rp,
         sp->s_snmp_errno = SNMPERR_BAD_SENDTO;
         sp->s_errno = errno;
         snmp_set_detail(strerror(errno));
-        if (rp->callback)
+        if (rp->callback) {
             rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp,
                          rp->pdu->reqid, rp->pdu, rp->cb_data);
+            remove_request(isp, orp, rp);
+	}
         return -1;
     } else {
         netsnmp_get_monotonic_clock(&now);
@@ -6813,17 +6829,11 @@ snmp_sess_timeout(void *sessp)
                     callback(NETSNMP_CALLBACK_OP_TIMED_OUT, sp,
                              rp->pdu->reqid, rp->pdu, magic);
                 }
-                if (orp)
-                    orp->next_request = rp->next_request;
-                else
-                    isp->requests = rp->next_request;
-                if (isp->requestsEnd == rp)
-                    isp->requestsEnd = orp;
-                snmp_free_pdu(rp->pdu);
+                remove_request(isp, orp, rp);
                 freeme = rp;
                 continue;       /* don't update orp below */
             } else {
-                if (snmp_resend_request(slp, rp, TRUE)) {
+                if (snmp_resend_request(slp, orp, rp, TRUE)) {
                     break;
                 }
             }
-- 
2.20.1

_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to