Ping!

Thanks,
Andrew

Andrew Burgess <[email protected]> writes:

> We had a bug reported against Fedora GDB where a particular D symbol
> was causing an out of memory error.  I believe that the symbol is
> valid (see below for details), but I'm not sure if it is possible to
> successfully demangle it, so this patch aims to avoid the out of
> memory error.
>
> The symbol in question can be found in the newly added test case.
>
> The symbol in question makes heavy use of back references to compress
> the demangled string.  The back references are nested to ~57 levels of
> depth, but that's not the real problem.  The real issue is that many
> of these nested levels of back reference refer, multiple times, to the
> earlier levels.  If the innermost level of back reference results in a
> 1 byte string (it doesn't, the resulting string is longer), and each
> level of back reference refers to the next inner level just twice
> (it's usually more), then by 30 levels the resulting string is 1G in
> size.  This is an absolute lower bound of the required size.  By 40
> levels we have, at a minimum, a 1024G string.
>
> My plan for fixing this is to count nested back references, resetting
> the counter at the "top level", i.e. at a nesting depth of 0.  If we
> see too many (based on some arbitrary limit) nested back references,
> then we'll abort the demangle, and return NULL.
>
> Now, this isn't purely a recursion problem.  The theoretical maximum
> depth is just 57, but the problem is the cumulative work that ends up
> being done due to the exponential growth of each level referencing the
> earlier levels.
>
> However, I wanted to avoid adding a separate flag for this to the
> demangler, and I figured that this problem is close enough to
> recursion that we can reuse the existing recursion limit flag here
> too.
>
> And so, the solution I propose is that each time we encounter a nested
> back reference we increment a counter.  If we see more than
> DEMANGLE_RECURSION_LIMIT nested back references then we abort the
> demangle, even if these back references are not all nested one inside
> the other.  A second counter tracks the current depth of nested back
> references.  If we are at depth 0 then the first counter, the nested
> back reference counter, is reset back to 0.
>
> With this fix in place, when passed the problematic mangled symbol,
> the demangler returns NULL.  The c++filt tool will just re-print the
> mangled symbol.  The D demangler tests are extended to include the
> problematic symbol.
>
> Side note: I mentioned above that I think this symbol is valid.  To
> reach this conclusion I implemented a hack where I cached the result
> of evaluating a back reference.  The first time a back reference is
> processed I actually processed the input to check the string demangled
> correctly, but I never added this demangled string to the final
> result.  The second time the same back reference is requested, I spot
> the cached entry and return, again adding nothing to the result string.
>
> Thus, the final result string contains only the content from the top
> level.  None of the back referenced content is included.  The input
> string (mangled) is 2695 bytes, while the output string was 42kB, but
> this skips all the nested content.  However, we did get to the end of
> the input string, and returned a non-NULL value.  The full string
> would be multiple GB given the exponential growth.
>
> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2368350
> ---
>  libiberty/d-demangle.c                  | 53 +++++++++++++++++++++++--
>  libiberty/testsuite/d-demangle-expected | 12 ++++++
>  2 files changed, 62 insertions(+), 3 deletions(-)
>
> diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
> index f059f5690bb..db179244519 100644
> --- a/libiberty/d-demangle.c
> +++ b/libiberty/d-demangle.c
> @@ -175,8 +175,33 @@ struct dlang_info
>  {
>    /* The string we are demangling.  */
>    const char *s;
> +
> +  /* The options passed to the demangler.  */
> +  int options;
> +
>    /* The index of the last back reference.  */
>    int last_backref;
> +
> +  /* The total number of back referenced types, including nested back
> +     referenced types.  This is reset to zero each time a back
> +     reference is processed at the "top level" (i.e. not a nested back
> +     reference).
> +
> +     Cases have been encountered where a back reference type includes
> +     references to other back reference types, which include further
> +     back references.  This was observed to a depth of 57.  If each
> +     layer only references the layer before twice then the innermost
> +     string will be duplicated 2^57 times!
> +
> +     We count the number of nested back references and compare this to
> +     the recursion limit, even though this isn't strictly recursion.  */
> +  int num_backrefs;
> +
> +  /* Track the depth of back references.  Depth 0 is considered the
> +     top level.  When we encounter a back reference at this depth the
> +     NUM_BACKREFS field is reset to 0.  At greater depths,
> +     NUM_BACKREFS will be incremented.  */
> +  int backref_depth;
>  };
>  
>  /* Pass as the LEN to dlang_parse_template if symbol length is not known.  */
> @@ -411,6 +436,24 @@ dlang_type_backref (string *decl, const char *mangled, 
> struct dlang_info *info,
>    if (mangled - info->s >= info->last_backref)
>      return NULL;
>  
> +  /* A back reference might point to a type that itself contains back
> +     references, which might themselves contain back references.
> +     There might not be recursion going on, but we can run into
> +     problems of exponential growth caused by the inner type appearing
> +     to be referenced 10s of millions of times.  */
> +  if (info->backref_depth > 0)
> +    {
> +      if ((info->options & DMGL_NO_RECURSE_LIMIT) == 0
> +       && info->num_backrefs > DEMANGLE_RECURSION_LIMIT)
> +     return NULL;
> +
> +      info->num_backrefs++;
> +    }
> +  else
> +    info->num_backrefs = 0;
> +
> +  info->backref_depth++;
> +
>    int save_refpos = info->last_backref;
>    info->last_backref = mangled - info->s;
>  
> @@ -424,6 +467,7 @@ dlang_type_backref (string *decl, const char *mangled, 
> struct dlang_info *info,
>      backref = dlang_type (decl, backref, info);
>  
>    info->last_backref = save_refpos;
> +  info->backref_depth--;
>  
>    if (backref == NULL)
>      return NULL;
> @@ -1931,17 +1975,20 @@ dlang_parse_template (string *decl, const char 
> *mangled,
>  /* Initialize the information structure we use to pass around information.  
> */
>  static void
>  dlang_demangle_init_info (const char *mangled, int last_backref,
> -                       struct dlang_info *info)
> +                       int options, struct dlang_info *info)
>  {
>    info->s = mangled;
> +  info->options = options;
>    info->last_backref = last_backref;
> +  info->num_backrefs = 0;
> +  info->backref_depth = 0;
>  }
>  
>  /* Extract and demangle the symbol in MANGLED.  Returns the demangled
>     signature on success or NULL on failure.  */
>  
>  char *
> -dlang_demangle (const char *mangled, int option ATTRIBUTE_UNUSED)
> +dlang_demangle (const char *mangled, int options)
>  {
>    string decl;
>    char *demangled = NULL;
> @@ -1962,7 +2009,7 @@ dlang_demangle (const char *mangled, int option 
> ATTRIBUTE_UNUSED)
>      {
>        struct dlang_info info;
>  
> -      dlang_demangle_init_info (mangled, strlen (mangled), &info);
> +      dlang_demangle_init_info (mangled, strlen (mangled), options, &info);
>        mangled = dlang_parse_mangle (&decl, mangled, &info);
>  
>        /* Check that the entire symbol was successfully demangled.  */
> diff --git a/libiberty/testsuite/d-demangle-expected 
> b/libiberty/testsuite/d-demangle-expected
> index cfbdf2a52cb..8c671b62316 100644
> --- a/libiberty/testsuite/d-demangle-expected
> +++ b/libiberty/testsuite/d-demangle-expected
> @@ -1475,3 +1475,15 @@ demangle.anonymous.foo
>  --format=auto
>  _D8demangle9anonymous03fooZ
>  demangle.anonymous.foo
> +#
> +# This symbol was seen in the wild.  It has a pattern of back
> +# references that result in exponential growth, with the innermost
> +# type being referenced 10s of millions of times.  We use the
> +# demangler's recursion limit to place a hard cap on back reference
> +# usage, which causes the demangler to safely bail out before
> +# consuming an excessive amount of memory.  Trying to demangle this
> +# without the recursion limit in place will eventually cause the OOM
> +# killer to kick in.
> +--format=dlang
> +_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk
> +_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk
>
> base-commit: edf0ae3b9f1ca23daaa55628683edc9a3360286f
> -- 
> 2.25.4

Reply via email to