Now it works fine! thx Anders Wallin
On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com> wrote: > Hi Anders, > > Thank you for your feedback! > I attach the v2 patch. Could you try it? > > On the v1 patch, I missed the check for the request callback. So, the > request > gets removed even though the callback doesn't run. > > Thanks, > Masa > > On 4/8/19 11:06 AM, Anders Wallin wrote: > > Hi Masa, > > > > looks like it solves the problem reported by Josef, BUT it breaks > DTLSUDP. > > I run the tests w/o analyzing why. > > To reproduce the issue I did the following using net-snmp master branch, > > plus these patches > > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the > > sending is failed (10 minutes ago) <Masayoshi Mizuma> > > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders > Wallin> > > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 days > > ago) <Anders Wallin> > > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch > > 'V5-8-patches' (9 weeks ago) <Bart Van Assche> > > > > $ ./configure --prefix=/usr \ > > --with-persistent-directory=/var/lib/net-snmp \ > > --with-mib-modules='smux tlstm-mib tsm-mib > examples/example > > examples/notification' \ > > --with-security-modules="tsm" \ > > --with-transports="TLSTCP DTLSUDP" \ > > --enable-shared \ > > --with-defaults \ > > --enable-ipv6 \ > > --with-cflags="-g -O2" \ > > --without-elf > > > > $ make install > > $ cd testing > > $ ./RUNFULLTESTS -g tls > > DTLS-UDP user certificate tests .......................... 41/? > > This hangs forever in "41" with snmpd.log saying.... > > ...... > > 2019-04-08 16:29:11 > > 2019-04-08 16:29:11 > > Received 0 byte packet from DTLSUDP: unknown > > 2019-04-08 16:29:11 > > 2019-04-08 16:29:13 > > Received 0 byte packet from DTLSUDP: unknown > > 2019-04-08 16:29:13 > > 2019-04-08 16:29:15 > > Received 0 byte packet from DTLSUDP: unknown > > 2019-04-08 16:29:15 > > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170 > > depth=0 err=18:self signed certificate > > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- > > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 1 > > (SSL_ERROR_SSL) > > 2019-04-08 16:29:15 TLS Error: certificate verify failed > > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ---- > > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- > > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5 > > (SSL_ERROR_SYSCALL): system_error=0 (Success) > > 2019-04-08 16:29:15 TLS Error: (null) > > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- > > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 > > (SSL_ERROR_SYSCALL): system_error=0 (Success) > > 2019-04-08 16:29:16 TLS Error: (null) > > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- > > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 > > (SSL_ERROR_SYSCALL): system_error=0 (Success) > > 2019-04-08 16:29:16 TLS Error: (null) > > > > With the fix suggested på Josef I don't see the DTLSUDP problem, but > maybe > > there are other problems. > > > > Regards > > Anders Wallin > > > > PS. thx for adding commit info to a420d87d3, I updated the patch with > your > > commit comments > > > > > > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com> > > wrote: > > > >> Hi Josef, > >> > >> I attach two patches to fix the memory inconsistency if the request is > >> resend and timed out. > >> Could you try them? > >> > >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch > >> > >> This patch was posted by Anders, and I tried to add the description. > >> This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback. > >> > >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch > >> > >> This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED > >> and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed, > >> then remove the request from the internal session. > >> > >> Thanks, > >> Masa > >> > >> On 4/3/19 9:34 AM, Anders Wallin wrote: > >>> The introduction of that code fixes another issue; > >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020 > >>> Author: Bill Fenner <fen...@gmail.com> > >>> Date: Wed Dec 20 21:52:10 2017 +0000 > >>> > >>> NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3 > >>> agent > >>> > >>> With the patch in 1214, the snmp_api code assumed that if magic was > >>> set, it was the "struct synch-state" from snmp_client. Of course, > >>> magic belongs to the caller, and the perl library uses it > >> differently, > >>> so reaching into it is verboten. Introduce a new callback (that > >>> was already introduced in 5.8) to report this "retries exceeded" > >>> state, and use it in snmp_client." > >>> > >>> I think the problem is really about shutting down the agentx connection > >>> when one(1) response is to late. I have > >>> done 2 patches (one that only write a better log message and one that > >>> removes the "bad" code. > >>> With these patches I don't get any crash. I think that 5.7.3 has this > >> issue > >>> as well, but it can not be crashed with the agentofdead code > >>> > >>> Can you please try this? > >>> > >>> Regards > >>> Anders Wallin > >>> > >>> > >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote: > >>> > >>>> Hi, > >>>> > >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that > >>>> following callbacks in snmplib/snmp_api.c causes the core dump issue: > >>>> > >>>> --- old/snmplib/snmp_api.c 2019-04-03 12:13:55.126769866 +0200 > >>>> +++ new/snmplib/snmp_api.c 2019-04-03 12:15:18.353420790 +0200 > >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list > >>>> sp->s_snmp_errno = SNMPERR_BAD_SENDTO; > >>>> sp->s_errno = errno; > >>>> snmp_set_detail(strerror(errno)); > >>>> - if (rp->callback) > >>>> +/* if (rp->callback) > >>>> rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp, > >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); > >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ > >>>> return -1; > >>>> } else { > >>>> netsnmp_get_monotonic_clock(&now); > >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list > >>>> tv.tv_sec += tv.tv_usec / 1000000L; > >>>> tv.tv_usec %= 1000000L; > >>>> rp->expireM = tv; > >>>> - if (rp->callback) > >>>> +/* if (rp->callback) > >>>> rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, > >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); > >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ > >>>> } > >>>> return 0; > >>>> } > >>>> > >>>> Without them, all works as expected. > >>>> > >>>> Josef Ridky > >>>> Software Engineer > >>>> Core Services Team > >>>> Red Hat Czech, s.r.o. > >>>> > >>>> ----- Original Message ----- > >>>> | From: "Anders Wallin" <walli...@gmail.com> > >>>> | To: "Josef Ridky" <jri...@redhat.com> > >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> > >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM > >>>> | Subject: Re: Core dump with net-snmp-5.8 > >>>> | > >>>> | Hi Josef, > >>>> | I can reproduce the issue using the master branch, I will take a > look > >> at > >>>> it > >>>> | later tonight or tomorrow > >>>> | > >>>> | Regards > >>>> | Anders Wallin > >>>> | > >>>> | > >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> > wrote: > >>>> | > >>>> | > Hi, > >>>> | > > >>>> | > thanks for your patch. Unfortunately, even when I have applied it, > >> it > >>>> | > still ends with core dump due of 'double free or corruption > >> (fasttop)' > >>>> | > > >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with: > >>>> | > > >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5) > >>>> | > snmp_agent: delegate session == 0x56207e165240 > >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240 > >>>> | > agentx/master: callback resend > >>>> | > agentx/master: callback resend > >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9 > >>>> | > agentx/master: close 0x56207dfd5400, -1 > >>>> | > snmp_agent: removed 40 delegated request(s) for session > >> 0x56207dfce490 > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240 > >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240 > >>>> | > snmp_agent: REMOVE session == 0x56207e165240 > >>>> | > snmp_agent: agent_session 0x56207e165240 released > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0 > >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0 > >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0 > >>>> | > snmp_agent: agent_session 0x56207e1041a0 released > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0 > >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0 > >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0 > >>>> | > snmp_agent: agent_session 0x56207e1656c0 released > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40 > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40 > >>>> | > snmp_agent: REMOVE session == 0x56207e11af40 > >>>> | > snmp_agent: agent_session 0x56207e11af40 released > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00 > >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00 > >>>> | > snmp_agent: REMOVE session == 0x56207e118f00 > >>>> | > snmp_agent: agent_session 0x56207e118f00 released > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540 > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540 > >>>> | > snmp_agent: REMOVE session == 0x56207e11b540 > >>>> | > snmp_agent: agent_session 0x56207e11b540 released > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00 > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00 > >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00 > >>>> | > snmp_agent: agent_session 0x56207e11bd00 released > >>>> | > agentx/master: Continue removing delegated subsession reqests > >>>> | > agentx/master: close transport > >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400 > >>>> | > agentx/master: response too late on session 0x56207dfd5400 > >>>> | > agentx/master: response too late on session 0x56207dfd5400 > >>>> | > double free or corruption (fasttop) > >>>> | > Aborted (core dumped) > >>>> | > > >>>> | > > >>>> | > What's interesting, when I run it with -DALL it pass (at least for > >>>> several > >>>> | > rounds). > >>>> | > It looks like some strange race condition. > >>>> | > > >>>> | > Regards > >>>> | > > >>>> | > Josef Ridky > >>>> | > Software Engineer > >>>> | > Core Services Team > >>>> | > Red Hat Czech, s.r.o. > >>>> | > > >>>> | > ----- Original Message ----- > >>>> | > | From: "Anders Wallin" <walli...@gmail.com> > >>>> | > | To: "Josef Ridky" <jri...@redhat.com> > >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> > >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM > >>>> | > | Subject: Re: Core dump with net-snmp-5.8 > >>>> | > | > >>>> | > | Hi Josef, > >>>> | > | > >>>> | > | I think it's the same issue as > >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/ > >>>> | > | (where I also posted the solution) > >>>> | > | Regards > >>>> | > | Anders Wallin > >>>> | > | > >>>> | > | > >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com> > >>>> wrote: > >>>> | > | > >>>> | > | > Hi, > >>>> | > | > > >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is > >>>> connected to > >>>> | > the > >>>> | > | > bug report [1]. > >>>> | > | > > >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon > >> will > >>>> crash > >>>> | > | > with malloc(): smallbin double linked list corrupted or double > >>>> free() > >>>> | > issue > >>>> | > | > and dumps core (see bellow). > >>>> | > | > From log file, I can identified one issue with "Unknown > >> operation". > >>>> | > | > > >>>> | > | > This issue is in the agentx_got_response function > >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented > action > >>>> for > >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in > >>>> | > | > include/net-snmp/library/snmp_api.h). > >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is > shown > >> in > >>>> log > >>>> | > | > file. > >>>> | > | > > >>>> | > | > /var/log/messages > >>>> | > | > ------------------------------- > >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in > >>>> | > | > agentx_got_response > >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in > >>>> | > | > agentx_got_response > >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin > >> double > >>>> | > linked > >>>> | > | > list corrupted > >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core > Dump > >>>> (PID > >>>> | > | > 13652/UID 0). > >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main > >> process > >>>> | > exited, > >>>> | > | > code=dumped, status=6/ABRT > >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed > with > >>>> result > >>>> | > | > 'core-dump'. > >>>> | > | > ------------------------------- > >>>> | > | > > >>>> | > | > The "Unknown operation" callback is caused by newly added > piece > >> of > >>>> | > code in > >>>> | > | > snmplib/snmp_api.c: > >>>> | > | > > >>>> | > | > static int > >>>> | > | > snmp_resend_request(struct session_list *slp, > >> netsnmp_request_list > >>>> | > *rp, > >>>> | > | > int incr_retries) > >>>> | > | > { > >>>> | > | > > >>>> | > | > ... > >>>> | > | > > >>>> | > | > tv.tv_sec += tv.tv_usec / 1000000L; > >>>> | > | > tv.tv_usec %= 1000000L; > >>>> | > | > rp->expireM = tv; > >>>> | > | > + if (rp->callback) > >>>> | > | > + rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, > >>>> | > | > + rp->pdu->reqid, rp->pdu, > rp->cb_data); > >>>> | > | > } > >>>> | > | > return 0; > >>>> | > | > } > >>>> | > | > > >>>> | > | > > >>>> | > | > When I tried to remove it, it just stop complaining about > >>>> operation 6, > >>>> | > but > >>>> | > | > the core dump is still present. > >>>> | > | > > >>>> | > | > May I ask you for help with this issue? Do you have any idea, > >> what > >>>> | > causing > >>>> | > | > this issue in 5.8 and how to fix it? > >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit > [2], > >>>> but it > >>>> | > | > looks like something other has changed and this issue is > current > >>>> again. > >>>> | > | > > >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/ > >>>> | > | > [2] > >>>> | > | > > >>>> | > > >>>> > >> > https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df > >>>> | > | > > >>>> | > | > Regards > >>>> | > | > > >>>> | > | > Josef Ridky > >>>> | > | > Software Engineer > >>>> | > | > Core Services Team > >>>> | > | > Red Hat Czech, s.r.o. > >>>> | > | > > >>>> | > | > > >>>> | > | > > >>>> | > | > _______________________________________________ > >>>> | > | > Net-snmp-coders mailing list > >>>> | > | > Net-snmp-coders@lists.sourceforge.net > >>>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders > >>>> | > | > > >>>> | > | > >>>> | > > >>>> | > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Net-snmp-coders mailing list > >>> Net-snmp-coders@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders > >>> > >> > > >
_______________________________________________ Net-snmp-coders mailing list Net-snmp-coders@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/net-snmp-coders