linux and ast <regex.h> have
        typedef int regoff_t;
this is a binary compatibility problem, not insurrmountable given
        src/lib/libast/features/api => <ast_api.h>

all ast <regex.h> size-ish variables and struct members are [s]ssize_t -- good
some internal ast sizes-ish variables and struct members are [unsigned ] int -- 
not too bad
it would take a day or so to make sure all int => size_t were uncovered

On Sun, 6 May 2012 23:37:41 +0200 =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?= wrote:
> Glenn, this came through the perl mailing list today. I do not have a
> suitable machine but you may have one. Can you test if the libast
> regex engine can reliably match strings >2GB, please?

> Olga

> ---------- Forwarded message ----------
> From: David Leadbeater <[email protected]>
> Date: Sun, May 6, 2012 at 8:19 PM
> Subject: [perl #112790] Regexp engine cannot match >2GB strings
> To: [email protected]

> # New Ticket Created by  David Leadbeater
> # Please include the string:  [perl #112790]
> # in the subject line of all future correspondence about this issue.
> # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=112790 >

> Matching unexpectedly fails when the string is longer than I32. The
> following fixes it, but I see a lot of I32 in the regexp engine itself so
> this might be masking other issues (see also RT #72784).

> diff --git a/pp_hot.c b/pp_hot.c
> index 89165d9..662b908 100644
> --- a/pp_hot.c
> +++ b/pp_hot.c
> @@ -1303,7 +1303,7 @@ PP(pp_match)
>        rx = PM_GETRE(pm);
>     }

> -    if (RX_MINLEN(rx) > (I32)len)
> +    if ((STRLEN)RX_MINLEN(rx) > len)
>        goto failure;

>     truebase = t = s;

> Reproduce with:

> $ perl -Mre=debug -le'$a="x"x 1048576; $b.=$a for 1 .. 2047; $b.="y"; print
> length $b; print $b =~ /y/ ? "Matched" : "No match"'
> Compiling REx "y"
> Final program:
>   1: EXACT <y> (3)
>   3: END (0)
> anchored "y" at 0 (checking anchored isall) minlen 1
> 2146435073
> Guessing start of match in sv for REx "y" against
> "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...
> Found anchored substr "y" at offset 2146435072...
> Starting position does not contradict /^/m...
> Guessed: match at offset 2146435072
> Matched
> Freeing REx: "y"

> $ perl -Mre=debugcolor -le'$a="x"x 1048576; $b.=$a for 1 .. 2048; $b.="y";
> print length $b; print $b =~ /y/ ? "Matched" : "No match"'
> Compiling REx "y"
> Final program:
>   1: EXACT <y> (3)
>   3: END (0)
> anchored "y" at 0 (checking anchored isall) minlen 1
> 2147483649
> No match
> Freeing REx: "y"

> Matching unexpectedly fails when the string is longer than I32. The
> following fixes it, but I see a lot of I32 in the regexp engine itself
> so this might be masking other issues (see also RT #72784).

> diff --git a/pp_hot.c b/pp_hot.c
> index 89165d9..662b908 100644
> --- a/pp_hot.c
> +++ b/pp_hot.c
> @@ -1303,7 +1303,7 @@ PP(pp_match)
>         rx = PM_GETRE(pm);
>      }

> -    if (RX_MINLEN(rx) > (I32)len)
> +    if ((STRLEN)RX_MINLEN(rx) > len)
>         goto failure;

>      truebase = t = s;

> Reproduce with:

> $ perl -Mre=debug -le'$a="x"x 1048576; $b.=$a for 1 .. 2047; $b.="y";
> print length $b; print $b =~ /y/ ? "Matched" : "No match"'
> Compiling REx "y"
> Final program:
>    1: EXACT <y> (3)
>    3: END (0)
> anchored "y" at 0 (checking anchored isall) minlen 1
> 2146435073
> Guessing start of match in sv for REx "y" against
> "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...
> Found anchored substr "y" at offset 2146435072...
> Starting position does not contradict /^/m...
> Guessed: match at offset 2146435072
> Matched
> Freeing REx: "y"

> $ perl -Mre=debugcolor -le'$a="x"x 1048576; $b.=$a for 1 .. 2048;
> $b.="y"; print length $b; print $b =~ /y/ ? "Matched" : "No match"'
> Compiling REx "y"
> Final program:
>    1: EXACT <y> (3)
>    3: END (0)
> anchored "y" at 0 (checking anchored isall) minlen 1
> 2147483649
> No match
> Freeing REx: "y"

> -- 
>       ,   _                                    _   ,
>      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
> .----'-/`-/     [email protected]   \-`\-'----.
>  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>       `--`                                      `--`

_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to