Glenn, this came through the perl mailing list today. I do not have a
suitable machine but you may have one. Can you test if the libast
regex engine can reliably match strings >2GB, please?

Olga

---------- Forwarded message ----------
From: David Leadbeater <[email protected]>
Date: Sun, May 6, 2012 at 8:19 PM
Subject: [perl #112790] Regexp engine cannot match >2GB strings
To: [email protected]


# New Ticket Created by  David Leadbeater
# Please include the string:  [perl #112790]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=112790 >


Matching unexpectedly fails when the string is longer than I32. The
following fixes it, but I see a lot of I32 in the regexp engine itself so
this might be masking other issues (see also RT #72784).

diff --git a/pp_hot.c b/pp_hot.c
index 89165d9..662b908 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1303,7 +1303,7 @@ PP(pp_match)
       rx = PM_GETRE(pm);
    }

-    if (RX_MINLEN(rx) > (I32)len)
+    if ((STRLEN)RX_MINLEN(rx) > len)
       goto failure;

    truebase = t = s;

Reproduce with:

$ perl -Mre=debug -le'$a="x"x 1048576; $b.=$a for 1 .. 2047; $b.="y"; print
length $b; print $b =~ /y/ ? "Matched" : "No match"'
Compiling REx "y"
Final program:
  1: EXACT <y> (3)
  3: END (0)
anchored "y" at 0 (checking anchored isall) minlen 1
2146435073
Guessing start of match in sv for REx "y" against
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...
Found anchored substr "y" at offset 2146435072...
Starting position does not contradict /^/m...
Guessed: match at offset 2146435072
Matched
Freeing REx: "y"

$ perl -Mre=debugcolor -le'$a="x"x 1048576; $b.=$a for 1 .. 2048; $b.="y";
print length $b; print $b =~ /y/ ? "Matched" : "No match"'
Compiling REx "y"
Final program:
  1: EXACT <y> (3)
  3: END (0)
anchored "y" at 0 (checking anchored isall) minlen 1
2147483649
No match
Freeing REx: "y"

Matching unexpectedly fails when the string is longer than I32. The
following fixes it, but I see a lot of I32 in the regexp engine itself
so this might be masking other issues (see also RT #72784).

diff --git a/pp_hot.c b/pp_hot.c
index 89165d9..662b908 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1303,7 +1303,7 @@ PP(pp_match)
        rx = PM_GETRE(pm);
     }

-    if (RX_MINLEN(rx) > (I32)len)
+    if ((STRLEN)RX_MINLEN(rx) > len)
        goto failure;

     truebase = t = s;

Reproduce with:

$ perl -Mre=debug -le'$a="x"x 1048576; $b.=$a for 1 .. 2047; $b.="y";
print length $b; print $b =~ /y/ ? "Matched" : "No match"'
Compiling REx "y"
Final program:
   1: EXACT <y> (3)
   3: END (0)
anchored "y" at 0 (checking anchored isall) minlen 1
2146435073
Guessing start of match in sv for REx "y" against
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...
Found anchored substr "y" at offset 2146435072...
Starting position does not contradict /^/m...
Guessed: match at offset 2146435072
Matched
Freeing REx: "y"

$ perl -Mre=debugcolor -le'$a="x"x 1048576; $b.=$a for 1 .. 2048;
$b.="y"; print length $b; print $b =~ /y/ ? "Matched" : "No match"'
Compiling REx "y"
Final program:
   1: EXACT <y> (3)
   3: END (0)
anchored "y" at 0 (checking anchored isall) minlen 1
2147483649
No match
Freeing REx: "y"



-- 
      ,   _                                    _   ,
     { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
.----'-/`-/     [email protected]   \-`\-'----.
 `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
      /\/\     Solaris/BSD//C/C++ programmer   /\/\
      `--`                                      `--`

_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to