On 5/7/2026 3:33 AM, Andrew Burgess wrote:
> Ping!
>
> Thanks,
> Andrew
>
> Andrew Burgess <[email protected]> writes:
>
>> We had a bug reported against Fedora GDB where a particular D symbol
>> was causing an out of memory error.  I believe that the symbol is
>> valid (see below for details), but I'm not sure if it is possible to
>> successfully demangle it, so this patch aims to avoid the out of
>> memory error.
>>
>> The symbol in question can be found in the newly added test case.
>>
>> The symbol in question makes heavy use of back references to compress
>> the demangled string.  The back references are nested to ~57 levels of
>> depth, but that's not the real problem.  The real issue is that many
>> of these nested levels of back reference refer, multiple times, to the
>> earlier levels.  If the innermost level of back reference results in a
>> 1 byte string (it doesn't, the resulting string is longer), and each
>> level of back reference refers to the next inner level just twice
>> (it's usually more), then by 30 levels the resulting string is 1G in
>> size.  This is an absolute lower bound of the required size.  By 40
>> levels we have, at a minimum, a 1024G string.
>>
>> My plan for fixing this is to count nested back references, resetting
>> the counter at the "top level", i.e. at a nesting depth of 0.  If we
>> see too many (based on some arbitrary limit) nested back references,
>> then we'll abort the demangle, and return NULL.
>>
>> Now, this isn't purely a recursion problem.  The theoretical maximum
>> depth is just 57, but the problem is the cumulative work that ends up
>> being done due to the exponential growth of each level referencing the
>> earlier levels.
>>
>> However, I wanted to avoid adding a separate flag for this to the
>> demangler, and I figured that this problem is close enough to
>> recursion that we can reuse the existing recursion limit flag here
>> too.
>>
>> And so, the solution I propose is that each time we encounter a nested
>> back reference we increment a counter.  If we see more than
>> DEMANGLE_RECURSION_LIMIT nested back references then we abort the
>> demangle, even if these back references are not all nested one inside
>> the other.  A second counter tracks the current depth of nested back
>> references.  If we are at depth 0 then the first counter, the nested
>> back reference counter, is reset back to 0.
>>
>> With this fix in place, when passed the problematic mangled symbol,
>> the demangler returns NULL.  The c++filt tool will just re-print the
>> mangled symbol.  The D demangler tests are extended to include the
>> problematic symbol.
>>
>> Side note: I mentioned above that I think this symbol is valid.  To
>> reach this conclusion I implemented a hack where I cached the result
>> of evaluating a back reference.  The first time a back reference is
>> processed I actually processed the input to check the string demangled
>> correctly, but I never added this demangled string to the final
>> result.  The second time the same back reference is requested, I spot
>> the cached entry and return, again adding nothing to the result string.
>>
>> Thus, the final result string contains only the content from the top
>> level.  None of the back referenced content is included.  The input
>> string (mangled) is 2695 bytes, while the output string was 42kB, but
>> this skips all the nested content.  However, we did get to the end of
>> the input string, and returned a non-NULL value.  The full string
>> would be multiple GB given the exponential growth.
>>
>> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2368350
Needs a ChangeLog entry.   Given it's referencing a Red Hat BZ entry, I 
don't think you can or should put a BZ marker in the ChangeLog entry.  
God only knows what'll happen if you were to include that and the bot 
would try to update that invalid bug # in the GCC bug database.



jeff

Reply via email to