On 5/7/2026 3:33 AM, Andrew Burgess wrote: > Ping! > > Thanks, > Andrew > > Andrew Burgess <[email protected]> writes: > >> We had a bug reported against Fedora GDB where a particular D symbol >> was causing an out of memory error. I believe that the symbol is >> valid (see below for details), but I'm not sure if it is possible to >> successfully demangle it, so this patch aims to avoid the out of >> memory error. >> >> The symbol in question can be found in the newly added test case. >> >> The symbol in question makes heavy use of back references to compress >> the demangled string. The back references are nested to ~57 levels of >> depth, but that's not the real problem. The real issue is that many >> of these nested levels of back reference refer, multiple times, to the >> earlier levels. If the innermost level of back reference results in a >> 1 byte string (it doesn't, the resulting string is longer), and each >> level of back reference refers to the next inner level just twice >> (it's usually more), then by 30 levels the resulting string is 1G in >> size. This is an absolute lower bound of the required size. By 40 >> levels we have, at a minimum, a 1024G string. >> >> My plan for fixing this is to count nested back references, resetting >> the counter at the "top level", i.e. at a nesting depth of 0. If we >> see too many (based on some arbitrary limit) nested back references, >> then we'll abort the demangle, and return NULL. >> >> Now, this isn't purely a recursion problem. The theoretical maximum >> depth is just 57, but the problem is the cumulative work that ends up >> being done due to the exponential growth of each level referencing the >> earlier levels. >> >> However, I wanted to avoid adding a separate flag for this to the >> demangler, and I figured that this problem is close enough to >> recursion that we can reuse the existing recursion limit flag here >> too. >> >> And so, the solution I propose is that each time we encounter a nested >> back reference we increment a counter. If we see more than >> DEMANGLE_RECURSION_LIMIT nested back references then we abort the >> demangle, even if these back references are not all nested one inside >> the other. A second counter tracks the current depth of nested back >> references. If we are at depth 0 then the first counter, the nested >> back reference counter, is reset back to 0. >> >> With this fix in place, when passed the problematic mangled symbol, >> the demangler returns NULL. The c++filt tool will just re-print the >> mangled symbol. The D demangler tests are extended to include the >> problematic symbol. >> >> Side note: I mentioned above that I think this symbol is valid. To >> reach this conclusion I implemented a hack where I cached the result >> of evaluating a back reference. The first time a back reference is >> processed I actually processed the input to check the string demangled >> correctly, but I never added this demangled string to the final >> result. The second time the same back reference is requested, I spot >> the cached entry and return, again adding nothing to the result string. >> >> Thus, the final result string contains only the content from the top >> level. None of the back referenced content is included. The input >> string (mangled) is 2695 bytes, while the output string was 42kB, but >> this skips all the nested content. However, we did get to the end of >> the input string, and returned a non-NULL value. The full string >> would be multiple GB given the exponential growth. >> >> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2368350 Needs a ChangeLog entry. Given it's referencing a Red Hat BZ entry, I don't think you can or should put a BZ marker in the ChangeLog entry. God only knows what'll happen if you were to include that and the bot would try to update that invalid bug # in the GCC bug database.
jeff
