Ping!
Thanks, Andrew Andrew Burgess <[email protected]> writes: > We had a bug reported against Fedora GDB where a particular D symbol > was causing an out of memory error. I believe that the symbol is > valid (see below for details), but I'm not sure if it is possible to > successfully demangle it, so this patch aims to avoid the out of > memory error. > > The symbol in question can be found in the newly added test case. > > The symbol in question makes heavy use of back references to compress > the demangled string. The back references are nested to ~57 levels of > depth, but that's not the real problem. The real issue is that many > of these nested levels of back reference refer, multiple times, to the > earlier levels. If the innermost level of back reference results in a > 1 byte string (it doesn't, the resulting string is longer), and each > level of back reference refers to the next inner level just twice > (it's usually more), then by 30 levels the resulting string is 1G in > size. This is an absolute lower bound of the required size. By 40 > levels we have, at a minimum, a 1024G string. > > My plan for fixing this is to count nested back references, resetting > the counter at the "top level", i.e. at a nesting depth of 0. If we > see too many (based on some arbitrary limit) nested back references, > then we'll abort the demangle, and return NULL. > > Now, this isn't purely a recursion problem. The theoretical maximum > depth is just 57, but the problem is the cumulative work that ends up > being done due to the exponential growth of each level referencing the > earlier levels. > > However, I wanted to avoid adding a separate flag for this to the > demangler, and I figured that this problem is close enough to > recursion that we can reuse the existing recursion limit flag here > too. > > And so, the solution I propose is that each time we encounter a nested > back reference we increment a counter. If we see more than > DEMANGLE_RECURSION_LIMIT nested back references then we abort the > demangle, even if these back references are not all nested one inside > the other. A second counter tracks the current depth of nested back > references. If we are at depth 0 then the first counter, the nested > back reference counter, is reset back to 0. > > With this fix in place, when passed the problematic mangled symbol, > the demangler returns NULL. The c++filt tool will just re-print the > mangled symbol. The D demangler tests are extended to include the > problematic symbol. > > Side note: I mentioned above that I think this symbol is valid. To > reach this conclusion I implemented a hack where I cached the result > of evaluating a back reference. The first time a back reference is > processed I actually processed the input to check the string demangled > correctly, but I never added this demangled string to the final > result. The second time the same back reference is requested, I spot > the cached entry and return, again adding nothing to the result string. > > Thus, the final result string contains only the content from the top > level. None of the back referenced content is included. The input > string (mangled) is 2695 bytes, while the output string was 42kB, but > this skips all the nested content. However, we did get to the end of > the input string, and returned a non-NULL value. The full string > would be multiple GB given the exponential growth. > > Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2368350 > --- > libiberty/d-demangle.c | 53 +++++++++++++++++++++++-- > libiberty/testsuite/d-demangle-expected | 12 ++++++ > 2 files changed, 62 insertions(+), 3 deletions(-) > > diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c > index f059f5690bb..db179244519 100644 > --- a/libiberty/d-demangle.c > +++ b/libiberty/d-demangle.c > @@ -175,8 +175,33 @@ struct dlang_info > { > /* The string we are demangling. */ > const char *s; > + > + /* The options passed to the demangler. */ > + int options; > + > /* The index of the last back reference. */ > int last_backref; > + > + /* The total number of back referenced types, including nested back > + referenced types. This is reset to zero each time a back > + reference is processed at the "top level" (i.e. not a nested back > + reference). > + > + Cases have been encountered where a back reference type includes > + references to other back reference types, which include further > + back references. This was observed to a depth of 57. If each > + layer only references the layer before twice then the innermost > + string will be duplicated 2^57 times! > + > + We count the number of nested back references and compare this to > + the recursion limit, even though this isn't strictly recursion. */ > + int num_backrefs; > + > + /* Track the depth of back references. Depth 0 is considered the > + top level. When we encounter a back reference at this depth the > + NUM_BACKREFS field is reset to 0. At greater depths, > + NUM_BACKREFS will be incremented. */ > + int backref_depth; > }; > > /* Pass as the LEN to dlang_parse_template if symbol length is not known. */ > @@ -411,6 +436,24 @@ dlang_type_backref (string *decl, const char *mangled, > struct dlang_info *info, > if (mangled - info->s >= info->last_backref) > return NULL; > > + /* A back reference might point to a type that itself contains back > + references, which might themselves contain back references. > + There might not be recursion going on, but we can run into > + problems of exponential growth caused by the inner type appearing > + to be referenced 10s of millions of times. */ > + if (info->backref_depth > 0) > + { > + if ((info->options & DMGL_NO_RECURSE_LIMIT) == 0 > + && info->num_backrefs > DEMANGLE_RECURSION_LIMIT) > + return NULL; > + > + info->num_backrefs++; > + } > + else > + info->num_backrefs = 0; > + > + info->backref_depth++; > + > int save_refpos = info->last_backref; > info->last_backref = mangled - info->s; > > @@ -424,6 +467,7 @@ dlang_type_backref (string *decl, const char *mangled, > struct dlang_info *info, > backref = dlang_type (decl, backref, info); > > info->last_backref = save_refpos; > + info->backref_depth--; > > if (backref == NULL) > return NULL; > @@ -1931,17 +1975,20 @@ dlang_parse_template (string *decl, const char > *mangled, > /* Initialize the information structure we use to pass around information. > */ > static void > dlang_demangle_init_info (const char *mangled, int last_backref, > - struct dlang_info *info) > + int options, struct dlang_info *info) > { > info->s = mangled; > + info->options = options; > info->last_backref = last_backref; > + info->num_backrefs = 0; > + info->backref_depth = 0; > } > > /* Extract and demangle the symbol in MANGLED. Returns the demangled > signature on success or NULL on failure. */ > > char * > -dlang_demangle (const char *mangled, int option ATTRIBUTE_UNUSED) > +dlang_demangle (const char *mangled, int options) > { > string decl; > char *demangled = NULL; > @@ -1962,7 +2009,7 @@ dlang_demangle (const char *mangled, int option > ATTRIBUTE_UNUSED) > { > struct dlang_info info; > > - dlang_demangle_init_info (mangled, strlen (mangled), &info); > + dlang_demangle_init_info (mangled, strlen (mangled), options, &info); > mangled = dlang_parse_mangle (&decl, mangled, &info); > > /* Check that the entire symbol was successfully demangled. */ > diff --git a/libiberty/testsuite/d-demangle-expected > b/libiberty/testsuite/d-demangle-expected > index cfbdf2a52cb..8c671b62316 100644 > --- a/libiberty/testsuite/d-demangle-expected > +++ b/libiberty/testsuite/d-demangle-expected > @@ -1475,3 +1475,15 @@ demangle.anonymous.foo > --format=auto > _D8demangle9anonymous03fooZ > demangle.anonymous.foo > +# > +# This symbol was seen in the wild. It has a pattern of back > +# references that result in exponential growth, with the innermost > +# type being referenced 10s of millions of times. We use the > +# demangler's recursion limit to place a hard cap on back reference > +# usage, which causes the demangler to safely bail out before > +# consuming an excessive amount of memory. Trying to demangle this > +# without the recursion limit in place will eventually cause the OOM > +# killer to kick in. > +--format=dlang > +_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk > +_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk > > base-commit: edf0ae3b9f1ca23daaa55628683edc9a3360286f > -- > 2.25.4
