Jeffrey Law <[email protected]> writes:

> Needs a ChangeLog entry.   Given it's referencing a Red Hat BZ entry, I 
> don't think you can or should put a BZ marker in the ChangeLog entry.  
> God only knows what'll happen if you were to include that and the bot 
> would try to update that invalid bug # in the GCC bug database.

An updated patch is below.  The only change is the addition of a
ChangeLog entry to the end of the commit message.

Thanks,
Andrew

---

commit 00f59c74236825cfc211c9dce4df5349402c5371
Author: Andrew Burgess <[email protected]>
Date:   Tue Apr 14 11:20:55 2026 +0100

    libiberty: avoid exponential back reference issue in D demangler
    
    We had a bug reported against Fedora GDB where a particular D symbol
    was causing an out of memory error.  I believe that the symbol is
    valid (see below for details), but I'm not sure if it is possible to
    successfully demangle it, so this patch aims to avoid the out of
    memory error.
    
    The symbol in question can be found in the newly added test case.
    
    The symbol in question makes heavy use of back references to compress
    the demangled string.  The back references are nested to ~57 levels of
    depth, but that's not the real problem.  The real issue is that many
    of these nested levels of back reference refer, multiple times, to the
    earlier levels.  If the innermost level of back reference results in a
    1 byte string (it doesn't, the resulting string is longer), and each
    level of back reference refers to the next inner level just twice
    (it's usually more), then by 30 levels the resulting string is 1G in
    size.  This is an absolute lower bound of the required size.  By 40
    levels we have, at a minimum, a 1024G string.
    
    My plan for fixing this is to count nested back references, resetting
    the counter at the "top level", i.e. at a nesting depth of 0.  If we
    see too many (based on some arbitrary limit) nested back references,
    then we'll abort the demangle, and return NULL.
    
    Now, this isn't purely a recursion problem.  The theoretical maximum
    depth is just 57, but the problem is the cumulative work that ends up
    being done due to the exponential growth of each level referencing the
    earlier levels.
    
    However, I wanted to avoid adding a separate flag for this to the
    demangler, and I figured that this problem is close enough to
    recursion that we can reuse the existing recursion limit flag here
    too.
    
    And so, the solution I propose is that each time we encounter a nested
    back reference we increment a counter.  If we see more than
    DEMANGLE_RECURSION_LIMIT nested back references then we abort the
    demangle, even if these back references are not all nested one inside
    the other.  A second counter tracks the current depth of nested back
    references.  If we are at depth 0 then the first counter, the nested
    back reference counter, is reset back to 0.
    
    With this fix in place, when passed the problematic mangled symbol,
    the demangler returns NULL.  The c++filt tool will just re-print the
    mangled symbol.  The D demangler tests are extended to include the
    problematic symbol.
    
    Side note: I mentioned above that I think this symbol is valid.  To
    reach this conclusion I implemented a hack where I cached the result
    of evaluating a back reference.  The first time a back reference is
    processed I actually processed the input to check the string demangled
    correctly, but I never added this demangled string to the final
    result.  The second time the same back reference is requested, I spot
    the cached entry and return, again adding nothing to the result string.
    
    Thus, the final result string contains only the content from the top
    level.  None of the back referenced content is included.  The input
    string (mangled) is 2695 bytes, while the output string was 42kB, but
    this skips all the nested content.  However, we did get to the end of
    the input string, and returned a non-NULL value.  The full string
    would be multiple GB given the exponential growth.
    
    Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2368350
    
    libiberty/ChangeLog:
    
            * d-demangle.c (struct dlang_info::options): New field,
            carries options passed to the demangler.
            (struct dlang_info::num_backrefs): Count total number of back
            referenced types that have been processed.
            (struct dlang_info::backref_depth): Count the depth of
            recursive back references.
            (dlang_type_backref): Track recursive back reference depth,
            and the total number of back references encountered.  Bail out
            early if the total number gets too high.
            (dlang_demangle_init_info): Initialise new 'struct dlang_info'
            fields.
            (dlang_demangle): Pass demangler options to
            dlang_demangle_init_info.
            * testsuite/d-demangle-expected: Add new test.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index f059f5690bb..db179244519 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -175,8 +175,33 @@ struct dlang_info
 {
   /* The string we are demangling.  */
   const char *s;
+
+  /* The options passed to the demangler.  */
+  int options;
+
   /* The index of the last back reference.  */
   int last_backref;
+
+  /* The total number of back referenced types, including nested back
+     referenced types.  This is reset to zero each time a back
+     reference is processed at the "top level" (i.e. not a nested back
+     reference).
+
+     Cases have been encountered where a back reference type includes
+     references to other back reference types, which include further
+     back references.  This was observed to a depth of 57.  If each
+     layer only references the layer before twice then the innermost
+     string will be duplicated 2^57 times!
+
+     We count the number of nested back references and compare this to
+     the recursion limit, even though this isn't strictly recursion.  */
+  int num_backrefs;
+
+  /* Track the depth of back references.  Depth 0 is considered the
+     top level.  When we encounter a back reference at this depth the
+     NUM_BACKREFS field is reset to 0.  At greater depths,
+     NUM_BACKREFS will be incremented.  */
+  int backref_depth;
 };
 
 /* Pass as the LEN to dlang_parse_template if symbol length is not known.  */
@@ -411,6 +436,24 @@ dlang_type_backref (string *decl, const char *mangled, 
struct dlang_info *info,
   if (mangled - info->s >= info->last_backref)
     return NULL;
 
+  /* A back reference might point to a type that itself contains back
+     references, which might themselves contain back references.
+     There might not be recursion going on, but we can run into
+     problems of exponential growth caused by the inner type appearing
+     to be referenced 10s of millions of times.  */
+  if (info->backref_depth > 0)
+    {
+      if ((info->options & DMGL_NO_RECURSE_LIMIT) == 0
+         && info->num_backrefs > DEMANGLE_RECURSION_LIMIT)
+       return NULL;
+
+      info->num_backrefs++;
+    }
+  else
+    info->num_backrefs = 0;
+
+  info->backref_depth++;
+
   int save_refpos = info->last_backref;
   info->last_backref = mangled - info->s;
 
@@ -424,6 +467,7 @@ dlang_type_backref (string *decl, const char *mangled, 
struct dlang_info *info,
     backref = dlang_type (decl, backref, info);
 
   info->last_backref = save_refpos;
+  info->backref_depth--;
 
   if (backref == NULL)
     return NULL;
@@ -1931,17 +1975,20 @@ dlang_parse_template (string *decl, const char *mangled,
 /* Initialize the information structure we use to pass around information.  */
 static void
 dlang_demangle_init_info (const char *mangled, int last_backref,
-                         struct dlang_info *info)
+                         int options, struct dlang_info *info)
 {
   info->s = mangled;
+  info->options = options;
   info->last_backref = last_backref;
+  info->num_backrefs = 0;
+  info->backref_depth = 0;
 }
 
 /* Extract and demangle the symbol in MANGLED.  Returns the demangled
    signature on success or NULL on failure.  */
 
 char *
-dlang_demangle (const char *mangled, int option ATTRIBUTE_UNUSED)
+dlang_demangle (const char *mangled, int options)
 {
   string decl;
   char *demangled = NULL;
@@ -1962,7 +2009,7 @@ dlang_demangle (const char *mangled, int option 
ATTRIBUTE_UNUSED)
     {
       struct dlang_info info;
 
-      dlang_demangle_init_info (mangled, strlen (mangled), &info);
+      dlang_demangle_init_info (mangled, strlen (mangled), options, &info);
       mangled = dlang_parse_mangle (&decl, mangled, &info);
 
       /* Check that the entire symbol was successfully demangled.  */
diff --git a/libiberty/testsuite/d-demangle-expected 
b/libiberty/testsuite/d-demangle-expected
index cfbdf2a52cb..8c671b62316 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -1475,3 +1475,15 @@ demangle.anonymous.foo
 --format=auto
 _D8demangle9anonymous03fooZ
 demangle.anonymous.foo
+#
+# This symbol was seen in the wild.  It has a pattern of back
+# references that result in exponential growth, with the innermost
+# type being referenced 10s of millions of times.  We use the
+# demangler's recursion limit to place a hard cap on back reference
+# usage, which causes the demangler to safely bail out before
+# consuming an excessive amount of memory.  Trying to demangle this
+# without the recursion limit in place will eventually cause the OOM
+# killer to kick in.
+--format=dlang
+_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk
+_D3std6format5write__T14formattedWriteTDFAxaZvTaTS2ae5utils4text7functor__T13stringifiableTSQBqQBqQBi10primitives__TQCaSQCsQCsQCk11composition__T3seqTSQDxQDxQDpQCh__TQDySQEqQEqQEnQEl__T17formattingFunctorVAyaa2_2573Vii250TQrZ15__lambda_L40_C3TQBmZ7FunctorTSQHzQHzQHrQGj__TQIaSQIsQIsQIpQIn__TQEcVQDma2_2573Vii250TSQKdQKdQKaQJy__TQJtTSQKxQKxQKpQJh__TQKyS4btdu2ui6curses6Curses4Wand__T8withAttrTSQNfQNfQNcQNa__TQMvTSQNzQNzQNrQMj__TQOaSQOsQOsQOkQMa__T6selectTSQPqQPqQPiQOa__TQPrSQQjQQjQQbQOt__T12valueFunctorTbZQrFbZ17__lambda_L143_C57TbZQLbTSQSvQSvQSnQRf__TQSwSQToQToQTlQTj__T5fmtIfVQOlnTbTPFNaNbNiNfZQPdTSQVhQVhQUzQTr__TQViSQKkQKi7browser7Browser6updateMFZ__T17__lambda_L1185_C4TQSeTSQYiQYiQYfQYd__TQXyTSQZcQZcQYuQXm__TQZdSQZvQZvQZnQXd__TQLdTQKzTSQBAtQBAuQBAnQZg__TQBAxSQBBqQBBrQBBpQBBo__TQIgVQWonTbTSQBCvQBCwQBCpQBBi__TQBDaSQBDtQBDuQBDnQBCg__TQNoTSQBErQBEsQBEqQBEp__TQBElTSQBFqQBFrQBFkQBEd__TQBFvSQBGoQBGpQBGiQBDz__TQBDqTSQBHnQBHoQBHhQBGa__TQBHsSQBIlQBImQBIkQBIj__TQBDzVQBDka2_2573Vii250TSQBKcQBKdQBKbQBKa__TQBJwTSQBLbQBLcQBKvQBJo__TQBLgSQBAjQBAiQBAjQBAgQBAd__TQBAcTQBHcTSQBNhQBNiQBNgQBNf__TQBNbTSQBOgQBOhQBOaQBMt__TQBOlSQBDoQBDnQBDoQBDlQBDi__TQBDhTQBKhZQBDrMFEQBFcQBFbQBFcQBEzQBEw9AttributebKQBLzZ19__lambda_L503_C20_1TPSQBHlQBHkQBHlQBHiQBHfTQDfTbTSQBUdQBUeQBUcQBUb__TQBTxTSQBVcQBVdQBUwQBTp__TQBVhSQBWaQBWbQBVuQBTl__TQBTcTQBTcZ3funTQBTmZQBPyZQBWtFQCwZ13StringifiableZQBRcZQBXxFQKwZQBeTQBTkZQBMuMFQJdbQBTzQMwQBUgZ17__lambda_L503_C20TQIfTQKrTbTSQCBpQCBqQCBoQCBn__TQCBjTSQCCoQCCpQCCiQCBb__TQCCtSQCDmQCDnQCDgQCAx__TQCAoTQCAoTSQCEqQCErQCEkQCDd__TQCEvSQCFoQCFpQCFnQCFm__TQCBcVQCAna2_2573Vii250TQTyZ17__lambda_L40_C3_1TQUwZQCArTQCEpZQLnTQCEyTQEkTQCFhZQCBtZQCIoFQHfZQLvZQCClZQCJgFQZkZQMnZ17__lambda_L40_C3_3TQBBlZQCEcTQCIaTQBEpZQPdTQBEyTQCItTQBFiZQCFkZQCMfFQBHuZQPnZQBWuFQBJhZ19__lambda_L143_C57_1TQBKiZQCHoTSQCPjQCPkQCPdQCNw__TQCPoSQCQhQCQiQCQbQCOu__TQCAcTQBHeZQCAmFQBHoZ19__lambda_L143_C57_2TQBIpZQCLgZQBYxFNibQBQoQEeZQWnTQBRaZQCMkTSQCUfQCUgQCTzQCSs__TQCUkSQCVdQCVeQCVcQCVb__TQCBtVQCQcnTbTQBTpTQHgZQCCpFNibQBUgQHwZQBAfTQIfZQCQcZQBAtTQCImTQBXoTQEhZQCRaZQCXvFQBZyZQBBdZQCCaMFQCTqQCBmbSQCOjQCOiQCEbQCDx13ScrollContextPSQCPq5paths11BrowserPathZ20__lambda_L1195_C76_1TQBxZQCVqZQCJhFNibQCJbQCIqZQBGyTQCJpZQCWwTSQDErQDEsQDElQDDe__TQDEwSQDFpQDFqQDFoQDFn__TQCMfVQDAonTbTQCMeTQCLuZQCNcFNibQCMwQCMlZQBKtTQCMvZQDArZQBLiTQCTbTQCQbTQEkZQDBpZQDIkFQCVqZQBLsZQCXeMFQBTnbQCXfZ19__lambda_L503_C20_2TQBSlTQBUyTbTSQDLxQDLyQDLwQDLv__TQDLrTSQDMwQDMxQDMqQDLj__TQDNbSQDNuQDNvQDNoQDLf__TQDKwTSQDOtQDOuQDOnQDNg__TQDOySQDPrQDPsQDPqQDPp__TQDLfVQDKqa2_2573Vii250TQDEdZ17__lambda_L40_C3_7TQDFcZQDKwZQBVnTQEdZQDLkZQDSfFQGoZQBVmZQDMdZQDSyFQDJgZQBWgZ17__lambda_L40_C3_8TQDLeZQDNwZQBYnTQDRzTQDOcZQDOqZQDVlFQDVaZQBYtZQDYbFKQDXqMxAaQDXoZk

Reply via email to