Jeffrey Law <[email protected]> writes: > Needs a ChangeLog entry. Given it's referencing a Red Hat BZ entry, I > don't think you can or should put a BZ marker in the ChangeLog entry. > God only knows what'll happen if you were to include that and the bot > would try to update that invalid bug # in the GCC bug database.
An updated patch is below. The only change is the addition of a ChangeLog entry to the end of the commit message. Thanks, Andrew --- commit 00f59c74236825cfc211c9dce4df5349402c5371 Author: Andrew Burgess <[email protected]> Date: Tue Apr 14 11:20:55 2026 +0100 libiberty: avoid exponential back reference issue in D demangler We had a bug reported against Fedora GDB where a particular D symbol was causing an out of memory error. I believe that the symbol is valid (see below for details), but I'm not sure if it is possible to successfully demangle it, so this patch aims to avoid the out of memory error. The symbol in question can be found in the newly added test case. The symbol in question makes heavy use of back references to compress the demangled string. The back references are nested to ~57 levels of depth, but that's not the real problem. The real issue is that many of these nested levels of back reference refer, multiple times, to the earlier levels. If the innermost level of back reference results in a 1 byte string (it doesn't, the resulting string is longer), and each level of back reference refers to the next inner level just twice (it's usually more), then by 30 levels the resulting string is 1G in size. This is an absolute lower bound of the required size. By 40 levels we have, at a minimum, a 1024G string. My plan for fixing this is to count nested back references, resetting the counter at the "top level", i.e. at a nesting depth of 0. If we see too many (based on some arbitrary limit) nested back references, then we'll abort the demangle, and return NULL. Now, this isn't purely a recursion problem. The theoretical maximum depth is just 57, but the problem is the cumulative work that ends up being done due to the exponential growth of each level referencing the earlier levels. However, I wanted to avoid adding a separate flag for this to the demangler, and I figured that this problem is close enough to recursion that we can reuse the existing recursion limit flag here too. And so, the solution I propose is that each time we encounter a nested back reference we increment a counter. If we see more than DEMANGLE_RECURSION_LIMIT nested back references then we abort the demangle, even if these back references are not all nested one inside the other. A second counter tracks the current depth of nested back references. If we are at depth 0 then the first counter, the nested back reference counter, is reset back to 0. With this fix in place, when passed the problematic mangled symbol, the demangler returns NULL. The c++filt tool will just re-print the mangled symbol. The D demangler tests are extended to include the problematic symbol. Side note: I mentioned above that I think this symbol is valid. To reach this conclusion I implemented a hack where I cached the result of evaluating a back reference. The first time a back reference is processed I actually processed the input to check the string demangled correctly, but I never added this demangled string to the final result. The second time the same back reference is requested, I spot the cached entry and return, again adding nothing to the result string. Thus, the final result string contains only the content from the top level. None of the back referenced content is included. The input string (mangled) is 2695 bytes, while the output string was 42kB, but this skips all the nested content. However, we did get to the end of the input string, and returned a non-NULL value. The full string would be multiple GB given the exponential growth. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2368350 libiberty/ChangeLog: * d-demangle.c (struct dlang_info::options): New field, carries options passed to the demangler. (struct dlang_info::num_backrefs): Count total number of back referenced types that have been processed. (struct dlang_info::backref_depth): Count the depth of recursive back references. (dlang_type_backref): Track recursive back reference depth, and the total number of back references encountered. Bail out early if the total number gets too high. (dlang_demangle_init_info): Initialise new 'struct dlang_info' fields. (dlang_demangle): Pass demangler options to dlang_demangle_init_info. * testsuite/d-demangle-expected: Add new test. diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c index f059f5690bb..db179244519 100644 --- a/libiberty/d-demangle.c +++ b/libiberty/d-demangle.c @@ -175,8 +175,33 @@ struct dlang_info { /* The string we are demangling. */ const char *s; + + /* The options passed to the demangler. */ + int options; + /* The index of the last back reference. */ int last_backref; + + /* The total number of back referenced types, including nested back + referenced types. This is reset to zero each time a back + reference is processed at the "top level" (i.e. not a nested back + reference). + + Cases have been encountered where a back reference type includes + references to other back reference types, which include further + back references. This was observed to a depth of 57. If each + layer only references the layer before twice then the innermost + string will be duplicated 2^57 times! + + We count the number of nested back references and compare this to + the recursion limit, even though this isn't strictly recursion. */ + int num_backrefs; + + /* Track the depth of back references. Depth 0 is considered the + top level. When we encounter a back reference at this depth the + NUM_BACKREFS field is reset to 0. At greater depths, + NUM_BACKREFS will be incremented. */ + int backref_depth; }; /* Pass as the LEN to dlang_parse_template if symbol length is not known. */ @@ -411,6 +436,24 @@ dlang_type_backref (string *decl, const char *mangled, struct dlang_info *info, if (mangled - info->s >= info->last_backref) return NULL; + /* A back reference might point to a type that itself contains back + references, which might themselves contain back references. + There might not be recursion going on, but we can run into + problems of exponential growth caused by the inner type appearing + to be referenced 10s of millions of times. */ + if (info->backref_depth > 0) + { + if ((info->options & DMGL_NO_RECURSE_LIMIT) == 0 + && info->num_backrefs > DEMANGLE_RECURSION_LIMIT) + return NULL; + + info->num_backrefs++; + } + else + info->num_backrefs = 0; + + info->backref_depth++; + int save_refpos = info->last_backref; info->last_backref = mangled - info->s; @@ -424,6 +467,7 @@ dlang_type_backref (string *decl, const char *mangled, struct dlang_info *info, backref = dlang_type (decl, backref, info); info->last_backref = save_refpos; + info->backref_depth--; if (backref == NULL) return NULL; @@ -1931,17 +1975,20 @@ dlang_parse_template (string *decl, const char *mangled, /* Initialize the information structure we use to pass around information. */ static void dlang_demangle_init_info (const char *mangled, int last_backref, - struct dlang_info *info) + int options, struct dlang_info *info) { info->s = mangled; + info->options = options; info->last_backref = last_backref; + info->num_backrefs = 0; + info->backref_depth = 0; } /* Extract and demangle the symbol in MANGLED. Returns the demangled signature on success or NULL on failure. */ char * -dlang_demangle (const char *mangled, int option ATTRIBUTE_UNUSED) +dlang_demangle (const char *mangled, int options) { string decl; char *demangled = NULL; @@ -1962,7 +2009,7 @@ dlang_demangle (const char *mangled, int option ATTRIBUTE_UNUSED) { struct dlang_info info; - dlang_demangle_init_info (mangled, strlen (mangled), &info); + dlang_demangle_init_info (mangled, strlen (mangled), options, &info); mangled = dlang_parse_mangle (&decl, mangled, &info); /* Check that the entire symbol was successfully demangled. */ diff --git a/libiberty/testsuite/d-demangle-expected b/libiberty/testsuite/d-demangle-expected index cfbdf2a52cb..8c671b62316 100644 --- a/libiberty/testsuite/d-demangle-expected +++ b/libiberty/testsuite/d-demangle-expected @@ -1475,3 +1475,15 @@ demangle.anonymous.foo --format=auto _D8demangle9anonymous03fooZ demangle.anonymous.foo +# +# This symbol was seen in the wild. It has a pattern of back +# references that result in exponential growth, with the innermost +# type being referenced 10s of millions of times. We use the +# demangler's recursion limit to place a hard cap on back reference +# usage, which causes the demangler to safely bail out before +# consuming an excessive amount of memory. Trying to demangle this +# without the recursion limit in place will eventually cause the OOM +# killer to kick in. +--format=dlang +_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk +_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk
