To try to make it easier I've created a fake pull request, allowing
comments on the diff and discussion, and a clear set of diffs between this
stuff and 5.8-patches:

https://github.com/fenner/net-snmp/pull/3

  Bill


On Mon, May 20, 2019 at 11:27 AM Bill Fenner <fen...@gmail.com> wrote:

> Some of the back and forth with rstory happened on the admin list; some of
> it was unicast.  I put together all of the patches, even though there are
> now at least 2 fixes for the problem, and published them at
> https://github.com/fenner/net-snmp/commits/rstory-getbulk-patches .
>
> I can't imagine that the changes are useful as they are (other than the
> added debugging, perhaps) but may be useful to see what kinds of things we
> ran into.
>
> And my opinion remains that the fundamental problem here is that the
> reverse encoding knows how to stop encoding in time, but doesn't know about
> the overhead that usm adds, and for some reason the changes in 5.8 try to
> address this by switching to forward encoding, which relies on the state
> that was freed (and really probably only helps because forward encoding
> uses a fixed-sized 1472-byte buffer).  (It's possible that the delayed free
> that Sam introduced helps the "state that was freed" part.  With 3
> different solutions for the problem it's hard to figure out exactly how to
> proceed.)
>
>   Bill
>
>
> On Mon, May 20, 2019 at 9:28 AM Bill Fenner <fen...@gmail.com> wrote:
>
>> On Tue, May 14, 2019 at 1:38 PM Bart Van Assche <bvanass...@acm.org>
>> wrote:
>>
>>> On 5/14/19 4:01 PM, Bill Fenner wrote:
>>> > Perhaps getbulk no longer dumps core, but I can not get it to return
>>> > anything but GENERR any more, and, it seems to leak memory.
>>> >
>>> > Any "large enough" request seems to fail in this way, e.g.,
>>> > snmpbulkget -v 3 ... -Cn 5 -Cr 50 sysUpTime sysUpTime sysUpTime
>>> > sysUpTime sysUpTime .1
>>> >
>>> > This is particularly frustrating because code was added to 5.8 to
>>> > rebuild a getbulk reply if it's too big.  But there was already code
>>> > to not build the PDU too big, it's just not taking the v3 header into
>>> > account properly (that's my best guess, at least).  So now there are
>>> > two different mechanisms to send a "right-size" reply and they both
>>> > don't work.  Around 5.8 release time I was working with Robert Story
>>> > to fix this, but that effort kind of petered out and Robert's work
>>> > didn't make it into git.
>>> >
>>> > Can anyone get getbulk to work in the current 5.8-patches with SNMPv3?
>>> >
>>> > I've attached a test script with 504 failing test cases when I run it
>>> > against an unmodified V5-8-patches branch, and net-snmp leaks about a
>>> > meg of RAM during the test.  This is an adaptation of my internal test
>>> > to the net-snmp test framework; the complete test would use all
>>> > supported values for -l, -a, -x but for now this is the simplest one
>>> > using nanp.
>>>
>>>
>>> Hi Bill,
>>>
>>> A new test has been added
>>> (testing/fulltests/default/T0221snmpbulkget_large_simple). That test
>>> passes on my setup. Can you have a look whether that test covers the
>>> issue you ran into?
>>>
>>
>> It doesn't - the problems arise when the agent decides that the message
>> is too big and doesn't fit in the buffer.  (E.g., the usage
>> of --sendMessageMaxSize in my test, or, the repetition of .1 .1 .1 .1 on
>> the snmpbulkget command line).
>>
>> Regarding the "memory leak": RSS is not a reliable source of information
>>> to detect memory leaks. If you want to verify whether or not a new
>>> memory leak has been introduced please use Valgrind.
>>>
>>
>> Yes, my internal test uses valgrind and shows the leaks, I was using RSS
>> growth as a proxy.  I also used a script something like
>> http://www.net-snmp.org/wiki/index.php/Running_Net-SNMP_Regression_Tests_under_Valgrind
>>  and
>> it showed leaks when I ran the test that I sent, e.g.,
>>
>> ==32438== 31,584 (8,736 direct, 22,848 indirect) bytes in 168 blocks are
>> definitely lost in loss record 756 of 758
>> ==32438==    at 0x402D05E: calloc (vg_replace_malloc.c:711)
>> ==32438==    by 0x430EA7A: usm_malloc_usmStateReference (snmpusm.c:288)
>> ==32438==    by 0x43117A7: usm_process_in_msg (snmpusm.c:2431)
>> ==32438==    by 0x4312B44: usm_secmod_process_in_msg (snmpusm.c:2335)
>> ==32438==    by 0x42D45EF: snmpv3_parse (snmp_api.c:3938)
>>
>> When I run your new test, I agree it passes:
>> ~/src/test-for-bart/testing @fenner-test-for-bart.sjc% ./RUNFULLTESTS -r
>> T0221
>> large SNMPv3 bulkget .. ok
>>
>> If I add "--sendMessageMaxSize=1472", then it fails in the same way as my
>> test fails: returning genError.
>>
>> Your subsequent patch may address this problem if the manager does not
>> specify a max response size, but as this is part of the SNMPv3 protocol and
>> we do not necessarily control what the manager does, it's not sufficient.
>>
>>   Bill
>>
>>
_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to