Hi folks, thanks for your solution, I have tested it internally and all works as expected. I hope, this will be soon part of net-snmp-5.8.
Regards Josef Ridky Software Engineer Core Services Team Red Hat Czech, s.r.o. ----- Original Message ----- | From: "Anders Wallin" <walli...@gmail.com> | To: "Masayoshi Mizuma" <msys.miz...@gmail.com> | Cc: "Josef Ridky" <jri...@redhat.com>, "Net-SNMP Coders" <net-snmp-coders@lists.sourceforge.net> | Sent: Tuesday, April 9, 2019 12:54:09 PM | Subject: Re: Core dump with net-snmp-5.8 | | Now it works fine! | | thx | Anders Wallin | | | On Tue, Apr 9, 2019 at 2:26 AM Masayoshi Mizuma <msys.miz...@gmail.com> | wrote: | | > Hi Anders, | > | > Thank you for your feedback! | > I attach the v2 patch. Could you try it? | > | > On the v1 patch, I missed the check for the request callback. So, the | > request | > gets removed even though the callback doesn't run. | > | > Thanks, | > Masa | > | > On 4/8/19 11:06 AM, Anders Wallin wrote: | > > Hi Masa, | > > | > > looks like it solves the problem reported by Josef, BUT it breaks | > DTLSUDP. | > > I run the tests w/o analyzing why. | > > To reproduce the issue I did the following using net-snmp master branch, | > > plus these patches | > > 39485c6f2 - snmplib/snmp_api: Remove the request on the session when the | > > sending is failed (10 minutes ago) <Masayoshi Mizuma> | > > 06a4d52d8 - agentx: logging to late responses (5 days ago) <Anders | > Wallin> | > > a420d87d3 - BUG2914: Agent master needs to treat resend as normal (5 days | > > ago) <Anders Wallin> | > > eaad09d04 - (origin/master, origin/HEAD, master) Merge branch | > > 'V5-8-patches' (9 weeks ago) <Bart Van Assche> | > > | > > $ ./configure --prefix=/usr \ | > > --with-persistent-directory=/var/lib/net-snmp \ | > > --with-mib-modules='smux tlstm-mib tsm-mib | > examples/example | > > examples/notification' \ | > > --with-security-modules="tsm" \ | > > --with-transports="TLSTCP DTLSUDP" \ | > > --enable-shared \ | > > --with-defaults \ | > > --enable-ipv6 \ | > > --with-cflags="-g -O2" \ | > > --without-elf | > > | > > $ make install | > > $ cd testing | > > $ ./RUNFULLTESTS -g tls | > > DTLS-UDP user certificate tests .......................... 41/? | > > This hangs forever in "41" with snmpd.log saying.... | > > ...... | > > 2019-04-08 16:29:11 | > > 2019-04-08 16:29:11 | > > Received 0 byte packet from DTLSUDP: unknown | > > 2019-04-08 16:29:11 | > > 2019-04-08 16:29:13 | > > Received 0 byte packet from DTLSUDP: unknown | > > 2019-04-08 16:29:13 | > > 2019-04-08 16:29:15 | > > Received 0 byte packet from DTLSUDP: unknown | > > 2019-04-08 16:29:15 | > > 2019-04-08 16:29:15 tls verification failure: ok=0 ctx=0x55ee625b4170 | > > depth=0 err=18:self signed certificate | > > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- | > > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 1 | > > (SSL_ERROR_SSL) | > > 2019-04-08 16:29:15 TLS Error: certificate verify failed | > > 2019-04-08 16:29:15 ---- End of OpenSSL Errors ---- | > > 2019-04-08 16:29:15 ---- OpenSSL Related Errors: ---- | > > 2019-04-08 16:29:15 TLS error: SSL_read: rc=-1, sslerror = 5 | > > (SSL_ERROR_SYSCALL): system_error=0 (Success) | > > 2019-04-08 16:29:15 TLS Error: (null) | > > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- | > > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 | > > (SSL_ERROR_SYSCALL): system_error=0 (Success) | > > 2019-04-08 16:29:16 TLS Error: (null) | > > 2019-04-08 16:29:16 ---- OpenSSL Related Errors: ---- | > > 2019-04-08 16:29:16 TLS error: SSL_read: rc=-1, sslerror = 5 | > > (SSL_ERROR_SYSCALL): system_error=0 (Success) | > > 2019-04-08 16:29:16 TLS Error: (null) | > > | > > With the fix suggested på Josef I don't see the DTLSUDP problem, but | > maybe | > > there are other problems. | > > | > > Regards | > > Anders Wallin | > > | > > PS. thx for adding commit info to a420d87d3, I updated the patch with | > your | > > commit comments | > > | > > | > > On Mon, Apr 8, 2019 at 3:27 PM Masayoshi Mizuma <msys.miz...@gmail.com> | > > wrote: | > > | > >> Hi Josef, | > >> | > >> I attach two patches to fix the memory inconsistency if the request is | > >> resend and timed out. | > >> Could you try them? | > >> | > >> - 0001-agentx-master-Return-when-NETSNMP_CALLBACK_OP_RESEND.patch | > >> | > >> This patch was posted by Anders, and I tried to add the description. | > >> This patch fixes the missing NETSNMP_CALLBACK_OP_RESEND callback. | > >> | > >> - 0002-snmplib-snmp_api-Remove-the-request-on-the-session-w.patch | > >> | > >> This patch fixes the race between NETSNMP_CALLBACK_OP_SEND_FAILED | > >> and NETSNMP_CALLBACK_OP_TIMED_OUT callback. If the request is failed, | > >> then remove the request from the internal session. | > >> | > >> Thanks, | > >> Masa | > >> | > >> On 4/3/19 9:34 AM, Anders Wallin wrote: | > >>> The introduction of that code fixes another issue; | > >>> "commit 56c30b11f3616ea4f0c38a21e08e78f050096020 | > >>> Author: Bill Fenner <fen...@gmail.com> | > >>> Date: Wed Dec 20 21:52:10 2017 +0000 | > >>> | > >>> NEWS: snmplib: PATCH: 1349: Fix perl/other crash against bad SNMPv3 | > >>> agent | > >>> | > >>> With the patch in 1214, the snmp_api code assumed that if magic was | > >>> set, it was the "struct synch-state" from snmp_client. Of course, | > >>> magic belongs to the caller, and the perl library uses it | > >> differently, | > >>> so reaching into it is verboten. Introduce a new callback (that | > >>> was already introduced in 5.8) to report this "retries exceeded" | > >>> state, and use it in snmp_client." | > >>> | > >>> I think the problem is really about shutting down the agentx connection | > >>> when one(1) response is to late. I have | > >>> done 2 patches (one that only write a better log message and one that | > >>> removes the "bad" code. | > >>> With these patches I don't get any crash. I think that 5.7.3 has this | > >> issue | > >>> as well, but it can not be crashed with the agentofdead code | > >>> | > >>> Can you please try this? | > >>> | > >>> Regards | > >>> Anders Wallin | > >>> | > >>> | > >>> On Wed, Apr 3, 2019 at 12:35 PM Josef Ridky <jri...@redhat.com> wrote: | > >>> | > >>>> Hi, | > >>>> | > >>>> I have compared net-snmp-5.7.3 and net-snmp-5.8 and I have found, that | > >>>> following callbacks in snmplib/snmp_api.c causes the core dump issue: | > >>>> | > >>>> --- old/snmplib/snmp_api.c 2019-04-03 12:13:55.126769866 +0200 | > >>>> +++ new/snmplib/snmp_api.c 2019-04-03 12:15:18.353420790 +0200 | > >>>> @@ -6731,9 +6731,9 @@ snmp_resend_request(struct session_list | > >>>> sp->s_snmp_errno = SNMPERR_BAD_SENDTO; | > >>>> sp->s_errno = errno; | > >>>> snmp_set_detail(strerror(errno)); | > >>>> - if (rp->callback) | > >>>> +/* if (rp->callback) | > >>>> rp->callback(NETSNMP_CALLBACK_OP_SEND_FAILED, sp, | > >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); | > >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ | > >>>> return -1; | > >>>> } else { | > >>>> netsnmp_get_monotonic_clock(&now); | > >>>> @@ -6743,9 +6743,9 @@ snmp_resend_request(struct session_list | > >>>> tv.tv_sec += tv.tv_usec / 1000000L; | > >>>> tv.tv_usec %= 1000000L; | > >>>> rp->expireM = tv; | > >>>> - if (rp->callback) | > >>>> +/* if (rp->callback) | > >>>> rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, | > >>>> - rp->pdu->reqid, rp->pdu, rp->cb_data); | > >>>> + rp->pdu->reqid, rp->pdu, rp->cb_data);*/ | > >>>> } | > >>>> return 0; | > >>>> } | > >>>> | > >>>> Without them, all works as expected. | > >>>> | > >>>> Josef Ridky | > >>>> Software Engineer | > >>>> Core Services Team | > >>>> Red Hat Czech, s.r.o. | > >>>> | > >>>> ----- Original Message ----- | > >>>> | From: "Anders Wallin" <walli...@gmail.com> | > >>>> | To: "Josef Ridky" <jri...@redhat.com> | > >>>> | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> | > >>>> | Sent: Tuesday, April 2, 2019 6:27:54 PM | > >>>> | Subject: Re: Core dump with net-snmp-5.8 | > >>>> | | > >>>> | Hi Josef, | > >>>> | I can reproduce the issue using the master branch, I will take a | > look | > >> at | > >>>> it | > >>>> | later tonight or tomorrow | > >>>> | | > >>>> | Regards | > >>>> | Anders Wallin | > >>>> | | > >>>> | | > >>>> | On Tue, Apr 2, 2019 at 3:42 PM Josef Ridky <jri...@redhat.com> | > wrote: | > >>>> | | > >>>> | > Hi, | > >>>> | > | > >>>> | > thanks for your patch. Unfortunately, even when I have applied it, | > >> it | > >>>> | > still ends with core dump due of 'double free or corruption | > >> (fasttop)' | > >>>> | > | > >>>> | > When I run snmpd with -Dsnmp_agent,agentx/master it ends with: | > >>>> | > | > >>>> | > agentx/master: sending pdu (req=0x1d4,trans=0x1d3,sess=0x5) | > >>>> | > snmp_agent: delegate session == 0x56207e165240 | > >>>> | > snmp_agent: end of handle_snmp_packet, asp = 0x56207e165240 | > >>>> | > agentx/master: callback resend | > >>>> | > agentx/master: callback resend | > >>>> | > agentx/master: timeout on session 0x56207dfd5400 req=0x1c9 | > >>>> | > agentx/master: close 0x56207dfd5400, -1 | > >>>> | > snmp_agent: removed 40 delegated request(s) for session | > >> 0x56207dfce490 | > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e165240 | > >>>> | > snmp_agent: canceling next walk for asp 0x56207e165240 | > >>>> | > snmp_agent: REMOVE session == 0x56207e165240 | > >>>> | > snmp_agent: agent_session 0x56207e165240 released | > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1041a0 | > >>>> | > snmp_agent: canceling next walk for asp 0x56207e1041a0 | > >>>> | > snmp_agent: REMOVE session == 0x56207e1041a0 | > >>>> | > snmp_agent: agent_session 0x56207e1041a0 released | > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e1656c0 | > >>>> | > snmp_agent: canceling next walk for asp 0x56207e1656c0 | > >>>> | > snmp_agent: REMOVE session == 0x56207e1656c0 | > >>>> | > snmp_agent: agent_session 0x56207e1656c0 released | > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11af40 | > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11af40 | > >>>> | > snmp_agent: REMOVE session == 0x56207e11af40 | > >>>> | > snmp_agent: agent_session 0x56207e11af40 released | > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e118f00 | > >>>> | > snmp_agent: canceling next walk for asp 0x56207e118f00 | > >>>> | > snmp_agent: REMOVE session == 0x56207e118f00 | > >>>> | > snmp_agent: agent_session 0x56207e118f00 released | > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11b540 | > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11b540 | > >>>> | > snmp_agent: REMOVE session == 0x56207e11b540 | > >>>> | > snmp_agent: agent_session 0x56207e11b540 released | > >>>> | > snmp_agent: processing delegated request, asp = 0x56207e11bd00 | > >>>> | > snmp_agent: canceling next walk for asp 0x56207e11bd00 | > >>>> | > snmp_agent: REMOVE session == 0x56207e11bd00 | > >>>> | > snmp_agent: agent_session 0x56207e11bd00 released | > >>>> | > agentx/master: Continue removing delegated subsession reqests | > >>>> | > agentx/master: close transport | > >>>> | > snmp_agent: REMOVE session == 0x56207dfd5400 | > >>>> | > agentx/master: response too late on session 0x56207dfd5400 | > >>>> | > agentx/master: response too late on session 0x56207dfd5400 | > >>>> | > double free or corruption (fasttop) | > >>>> | > Aborted (core dumped) | > >>>> | > | > >>>> | > | > >>>> | > What's interesting, when I run it with -DALL it pass (at least for | > >>>> several | > >>>> | > rounds). | > >>>> | > It looks like some strange race condition. | > >>>> | > | > >>>> | > Regards | > >>>> | > | > >>>> | > Josef Ridky | > >>>> | > Software Engineer | > >>>> | > Core Services Team | > >>>> | > Red Hat Czech, s.r.o. | > >>>> | > | > >>>> | > ----- Original Message ----- | > >>>> | > | From: "Anders Wallin" <walli...@gmail.com> | > >>>> | > | To: "Josef Ridky" <jri...@redhat.com> | > >>>> | > | Cc: "net-snmp-coders" <net-snmp-coders@lists.sourceforge.net> | > >>>> | > | Sent: Tuesday, April 2, 2019 1:46:40 PM | > >>>> | > | Subject: Re: Core dump with net-snmp-5.8 | > >>>> | > | | > >>>> | > | Hi Josef, | > >>>> | > | | > >>>> | > | I think it's the same issue as | > >>>> | > https://sourceforge.net/p/net-snmp/bugs/2914/ | > >>>> | > | (where I also posted the solution) | > >>>> | > | Regards | > >>>> | > | Anders Wallin | > >>>> | > | | > >>>> | > | | > >>>> | > | On Tue, Apr 2, 2019 at 12:43 PM Josef Ridky <jri...@redhat.com> | > >>>> wrote: | > >>>> | > | | > >>>> | > | > Hi, | > >>>> | > | > | > >>>> | > | > recently, I have hit to an issue in net-snmp-5.8, that is | > >>>> connected to | > >>>> | > the | > >>>> | > | > bug report [1]. | > >>>> | > | > | > >>>> | > | > When I tried to run agentofdeath test from [1], snmpd daemon | > >> will | > >>>> crash | > >>>> | > | > with malloc(): smallbin double linked list corrupted or double | > >>>> free() | > >>>> | > issue | > >>>> | > | > and dumps core (see bellow). | > >>>> | > | > From log file, I can identified one issue with "Unknown | > >> operation". | > >>>> | > | > | > >>>> | > | > This issue is in the agentx_got_response function | > >>>> | > | > (agent/mibgroup/agentx/master.c). There isn't implemented | > action | > >>>> for | > >>>> | > | > NETSNMP_CALLBACK_OP_RESEND (defined in | > >>>> | > | > include/net-snmp/library/snmp_api.h). | > >>>> | > | > As result "Unknown operation 6 in agentx_got_response" is | > shown | > >> in | > >>>> log | > >>>> | > | > file. | > >>>> | > | > | > >>>> | > | > /var/log/messages | > >>>> | > | > ------------------------------- | > >>>> | > | > Mar 28 06:52:42 localhost snmpd[12073]: Unknown operation 6 in | > >>>> | > | > agentx_got_response | > >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: Unknown operation 6 in | > >>>> | > | > agentx_got_response | > >>>> | > | > Mar 28 06:52:43 localhost snmpd[12073]: malloc(): smallbin | > >> double | > >>>> | > linked | > >>>> | > | > list corrupted | > >>>> | > | > Mar 28 06:52:43 localhost systemd[1]: Started Process Core | > Dump | > >>>> (PID | > >>>> | > | > 13652/UID 0). | > >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Main | > >> process | > >>>> | > exited, | > >>>> | > | > code=dumped, status=6/ABRT | > >>>> | > | > Mar 28 06:52:48 localhost systemd[1]: snmpd.service: Failed | > with | > >>>> result | > >>>> | > | > 'core-dump'. | > >>>> | > | > ------------------------------- | > >>>> | > | > | > >>>> | > | > The "Unknown operation" callback is caused by newly added | > piece | > >> of | > >>>> | > code in | > >>>> | > | > snmplib/snmp_api.c: | > >>>> | > | > | > >>>> | > | > static int | > >>>> | > | > snmp_resend_request(struct session_list *slp, | > >> netsnmp_request_list | > >>>> | > *rp, | > >>>> | > | > int incr_retries) | > >>>> | > | > { | > >>>> | > | > | > >>>> | > | > ... | > >>>> | > | > | > >>>> | > | > tv.tv_sec += tv.tv_usec / 1000000L; | > >>>> | > | > tv.tv_usec %= 1000000L; | > >>>> | > | > rp->expireM = tv; | > >>>> | > | > + if (rp->callback) | > >>>> | > | > + rp->callback(NETSNMP_CALLBACK_OP_RESEND, sp, | > >>>> | > | > + rp->pdu->reqid, rp->pdu, | > rp->cb_data); | > >>>> | > | > } | > >>>> | > | > return 0; | > >>>> | > | > } | > >>>> | > | > | > >>>> | > | > | > >>>> | > | > When I tried to remove it, it just stop complaining about | > >>>> operation 6, | > >>>> | > but | > >>>> | > | > the core dump is still present. | > >>>> | > | > | > >>>> | > | > May I ask you for help with this issue? Do you have any idea, | > >> what | > >>>> | > causing | > >>>> | > | > this issue in 5.8 and how to fix it? | > >>>> | > | > I know, that Jan Safranek has fixed this for 5.7 by commit | > [2], | > >>>> but it | > >>>> | > | > looks like something other has changed and this issue is | > current | > >>>> again. | > >>>> | > | > | > >>>> | > | > [1] https://sourceforge.net/p/net-snmp/bugs/2411/ | > >>>> | > | > [2] | > >>>> | > | > | > >>>> | > | > >>>> | > >> | > https://github.com/net-snmp/net-snmp/commit/793d596838ff7cb48a73b675d62897c56c9e62df | > >>>> | > | > | > >>>> | > | > Regards | > >>>> | > | > | > >>>> | > | > Josef Ridky | > >>>> | > | > Software Engineer | > >>>> | > | > Core Services Team | > >>>> | > | > Red Hat Czech, s.r.o. | > >>>> | > | > | > >>>> | > | > | > >>>> | > | > | > >>>> | > | > _______________________________________________ | > >>>> | > | > Net-snmp-coders mailing list | > >>>> | > | > Net-snmp-coders@lists.sourceforge.net | > >>>> | > | > https://lists.sourceforge.net/lists/listinfo/net-snmp-coders | > >>>> | > | > | > >>>> | > | | > >>>> | > | > >>>> | | > >>>> | > >>> | > >>> | > >>> | > >>> _______________________________________________ | > >>> Net-snmp-coders mailing list | > >>> Net-snmp-coders@lists.sourceforge.net | > >>> https://lists.sourceforge.net/lists/listinfo/net-snmp-coders | > >>> | > >> | > > | > | _______________________________________________ Net-snmp-coders mailing list Net-snmp-coders@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/net-snmp-coders