another way where SpamAssassin rules could be used to execute
arbitrary code...

--j.

------- Forwarded Message

Date:    Thu, 15 Nov 2007 21:17:28 +0000
From:    Nicholas Clark <[EMAIL PROTECTED]>
To:      [EMAIL PROTECTED]
Subject: Security issues in the Perl core

--zx4FCpZtqtKETZ7O
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

As people may have become aware, security researches at Google discovered a
buffer overflow in the regexp engine. As best I can tell, they reported it
to Linux vendors, asked them what the appropriate security contact address
for Perl 5 was, were given Yves' address, and Yves forwarded it to the Perl
5 committers.

And after discussing a patch that needed backporting from 5.10 (where it was
already fixed) to 5.8, THAT WAS THE LAST WE HEARD.

The next "contact" we had was discovering that the Linux vendors had made
public security announcements, without even notifying us, let alone
discussing a timescale.

I consider this outcome neither professional nor courteous, but hope that it
was caused by an unfortunate series of events that won't re-occur.

If I have made any mistakes here, the Linux vendors reading this list are
welcome to correct me. And given that they all ship perl, they all should
be reading this list.




Specifically, the bug is that if you have a pattern which in itself uses no
Unicode characters, but matches Unicode characters (for example \x{} capes)
then at compile time the regexp engine will allocate memory assuming an 8
bit representation on the first pass. However, if the pattern also has 8 bit
characters, then when the Unicode characters are compiled (on the second pass)
any existing 8 bit characters will be converted to UTF-8 representation,
which is likely to be a buffer overflow. Matches will also fail.

Given that it's been present since 5.8.0 (July 2002) and the bug itself (but
not the security implication) wasn't reported until early this year*, I don't
think that it's that likely to crop up in the wild.

Redhat's announcement is https://rhn.redhat.com/errata/RHSA-2007-0966.html

which I believe is unclear in its wording:

    Specially crafted input to a regular expression can cause Perl to
    improperly allocate memory, possibly resulting in arbitrary code running
    with the permissions of the user running Perl.


The "input" is the pattern, not the matched string. So it's not going to be
an issue at all, unless your programmers are foolish enough to allow
untrusted user input to be interpolated into regular expressions. In which
case, you were already open to denial of service attacks from patterns that
bust the C stack (fixed by Dave for 5.10) or take until the heat death of the
universe to complete (inherently unfixable in a general purpose programming
language)

The CVE announcement is http://nvd.nist.gov/nvd.cfm?cvename=CVE-2007-5116

It's terse, and has the same ambiguity:

    Buffer overflow in the polymorphic opcode support in the Regular
    Expression Engine (regcomp.c) in Perl 5.8 allows context-dependent
    attackers to execute arbitrary code by switching from byte to Unicode
    (UTF) characters in a regular expression.

Yes, conceivably you can now inject arbitrary code. But if you did, you
program (not perl) was badly written, and already had the ability to be
crashed or hung.


So, anyway. Robert has set up [EMAIL PROTECTED]

It's an address for reporting core security issues. That's all.

If you're a vendor, PLEASE USE IT. And please keep it on the Cc.
There is now NO EXCUSE.


Currently it points to a closed subscription unarchived mailing list. Right
now the subscribers to that list are (basically) the current Perl 5
committers. We'd welcome anyone competent to request to subscribe.

It's likely not to be high traffic - right now we seem to average 1 security
issue every 2 years, but we'd really like more people on it so that there's
a good chance that at least one of the subscribers will have the time to
respond to any initial report within 24 hours, at least to say roughly:

    "thanks for the report. I can confirm that this is a bug. We're looking
     into how to resolve it, and we'll get back to you"


Right now I don't feel comfortable that we have enough volunteers to get 99%
reliability of a reply within 24 hours, 7 days a week.


Meanwhile, attached are patches that resolve the issue, for all released
versions of Perl 5.8.x. They're context diffs (not unified diffs) with
symmetric context hunks, and a hand crafted patchlevel.h hunk.
Based on previous iterations of patches in 2005, even the fussiest vendor
supplied "patch" program coped with this format.

regexp-5.8.8.patch applies to 5.8.8
regexp-5.8.7.patch applies to 5.8.7, 5.8.6, 5.8.5
regexp-5.8.4.patch applies to 5.8.4
regexp-5.8.3.patch applies to 5.8.3, 5.8.2, 5.8.1
regexp-5.8.0.patch applies to 5.8.0

Please could people test them on any platform or perl to hand.
I'm particularly interested in whether they apply cleanly on non *BSD non
Linux systems. Oh, and whether no tests fail that weren't already failing.
(I'm seeing a find test failing on 5.8.0-5.8.3 on Linux, but not FreeBSD)

Nicholas Clark

* I believe against blead, not 5.8.x, by Jeurd, while at the German Perl
  Workshop, when trying to write a program to convert BNF grammars to 5.10
  regexps. If so, then you had to push the engine pretty far to get to it.
--zx4FCpZtqtKETZ7O
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="regexp-5.8.8.patch"

diff -rc perl-5.8.8/patchlevel.h perl-5.8.8.patched/patchlevel.h
*** perl-5.8.8/patchlevel.h     Tue Jan 31 16:12:10 2006
--- perl-5.8.8.patched/patchlevel.h     Thu Nov 15 16:49:41 2007
***************
*** 124 ****
!       ,NULL
--- 124,125 ----
!       ,"REGEXP0 - fix for UTF-8 recoding in regexps - CVE-2007-5116"
!       ,NULL
diff -rc perl-5.8.8/regcomp.c perl-5.8.8.patched/regcomp.c
*** perl-5.8.8/regcomp.c        Sun Jan  8 20:59:27 2006
--- perl-5.8.8.patched/regcomp.c        Thu Nov 15 16:38:53 2007
***************
*** 135,141 ****
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
--- 135,144 ----
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;           /* whether the pattern is utf8 or not *
/
!     I32               orig_utf8;      /* whether the pattern was originally i
n utf8 */
!                               /* XXX use this for future optimisation of case
!                                * where pattern must be upgraded to utf8. */
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
***************
*** 161,166 ****
--- 164,170 ----
  #define RExC_seen_zerolen     (pRExC_state->seen_zerolen)
  #define RExC_seen_evals       (pRExC_state->seen_evals)
  #define RExC_utf8     (pRExC_state->utf8)
+ #define RExC_orig_utf8        (pRExC_state->orig_utf8)
  
  #define       ISMULT1(c)      ((c) == '*' || (c) == '+' || (c) == '?')
  #define       ISMULT2(s)      ((*s) == '*' || (*s) == '+' || (*s) == '?' || \
***************
*** 1749,1763 ****
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
-     RExC_precomp = exp;
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), RExC_precomp, PL_colors[1]);
      });
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
--- 1753,1769 ----
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = RExC_orig_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), exp, PL_colors[1]);
      });
+ 
+ redo_first_pass:
+     RExC_precomp = exp;
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
***************
*** 1782,1787 ****
--- 1788,1812 ----
      if (reg(pRExC_state, 0, &flags) == NULL) {
        RExC_precomp = Nullch;
        return(NULL);
+     }
+     if (RExC_utf8 && !RExC_orig_utf8) {
+         /* It's possible to write a regexp in ascii that represents unicode
+         codepoints outside of the byte range, such as via \x{100}. If we
+         detect such a sequence we have to convert the entire pattern to utf8
+         and then recompile, as our sizing calculation will have been based
+         on 1 byte == 1 character, but we will need to use utf8 to encode
+         at least some part of the pattern, and therefore must convert the who
le
+         thing.
+         XXX: somehow figure out how to make this less expensive...
+         -- dmq */
+         STRLEN len = xend-exp;
+         DEBUG_r(PerlIO_printf(Perl_debug_log,
+           "UTF8 mismatch! Converting to utf8 for resizing and compile\n"));
+         exp = (char*)Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len);
+         xend = exp + len;
+         RExC_orig_utf8 = RExC_utf8;
+         SAVEFREEPV(exp);
+         goto redo_first_pass;
      }
      DEBUG_r(PerlIO_printf(Perl_debug_log, "size %"IVdf" ", (IV)RExC_size));
  
diff -rc perl-5.8.8/t/op/pat.t perl-5.8.8.patched/t/op/pat.t
*** perl-5.8.8/t/op/pat.t       Sat Jan  7 12:53:32 2006
--- perl-5.8.8.patched/t/op/pat.t       Thu Nov 15 16:45:18 2007
***************
*** 6,12 ****
  
  $| = 1;
  
! print "1..1187\n";
  
  BEGIN {
      chdir 't' if -d 't';
--- 6,12 ----
  
  $| = 1;
  
! print "1..1189\n";
  
  BEGIN {
      chdir 't' if -d 't';
***************
*** 3394,3399 ****
--- 3394,3408 ----
      ok($s eq 'cd',
         "# assigning to original string should not corrupt match vars");
  }
+  
+ {
+     use warnings;
+     my @w;
+     local $SIG{__WARN__}=sub{push @w,"@_"};
+     my $c=qq(\x{DF}); 
+     ok($c=~/${c}|\x{100}/, "ASCII pattern that really is utf8");
+     ok(@w==0, "No warnings");
+ }    
  
  # last test 1187
  

--zx4FCpZtqtKETZ7O
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="regexp-5.8.7.patch"

diff -rc perl-5.8.7/patchlevel.h perl-5.8.7.patched/patchlevel.h
*** perl-5.8.8/patchlevel.h     Tue Jan 31 16:12:10 2006
--- perl-5.8.8.patched/patchlevel.h     Thu Nov 15 16:49:41 2007
***************
*** 123 ****
!       ,NULL
--- 123,124 ----
!       ,"REGEXP0 - fix for UTF-8 recoding in regexps - CVE-2007-5116"
!       ,NULL
diff -rc perl-5.8.7/regcomp.c perl-5.8.7.patched/regcomp.c
*** perl-5.8.7/regcomp.c        Sat Feb  5 13:55:06 2005
--- perl-5.8.7.patched/regcomp.c        Thu Nov 15 17:02:46 2007
***************
*** 136,142 ****
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
--- 136,145 ----
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;           /* whether the pattern is utf8 or not *
/
!     I32               orig_utf8;      /* whether the pattern was originally i
n utf8 */
!                               /* XXX use this for future optimisation of case
!                                * where pattern must be upgraded to utf8. */
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
***************
*** 162,167 ****
--- 165,171 ----
  #define RExC_seen_zerolen     (pRExC_state->seen_zerolen)
  #define RExC_seen_evals       (pRExC_state->seen_evals)
  #define RExC_utf8     (pRExC_state->utf8)
+ #define RExC_orig_utf8        (pRExC_state->orig_utf8)
  
  #define       ISMULT1(c)      ((c) == '*' || (c) == '+' || (c) == '?')
  #define       ISMULT2(s)      ((*s) == '*' || (*s) == '+' || (*s) == '?' || \
***************
*** 1758,1772 ****
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
-     RExC_precomp = exp;
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), RExC_precomp, PL_colors[1]);
      });
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
--- 1762,1778 ----
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = RExC_orig_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), exp, PL_colors[1]);
      });
+ 
+ redo_first_pass:
+     RExC_precomp = exp;
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
***************
*** 1791,1796 ****
--- 1797,1821 ----
      if (reg(pRExC_state, 0, &flags) == NULL) {
        RExC_precomp = Nullch;
        return(NULL);
+     }
+     if (RExC_utf8 && !RExC_orig_utf8) {
+         /* It's possible to write a regexp in ascii that represents unicode
+         codepoints outside of the byte range, such as via \x{100}. If we
+         detect such a sequence we have to convert the entire pattern to utf8
+         and then recompile, as our sizing calculation will have been based
+         on 1 byte == 1 character, but we will need to use utf8 to encode
+         at least some part of the pattern, and therefore must convert the who
le
+         thing.
+         XXX: somehow figure out how to make this less expensive...
+         -- dmq */
+         STRLEN len = xend-exp;
+         DEBUG_r(PerlIO_printf(Perl_debug_log,
+           "UTF8 mismatch! Converting to utf8 for resizing and compile\n"));
+         exp = (char*)Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len);
+         xend = exp + len;
+         RExC_orig_utf8 = RExC_utf8;
+         SAVEFREEPV(exp);
+         goto redo_first_pass;
      }
      DEBUG_r(PerlIO_printf(Perl_debug_log, "size %"IVdf" ", (IV)RExC_size));
  
diff -rc perl-5.8.7/t/op/pat.t perl-5.8.7.patched/t/op/pat.t
*** perl-5.8.7/t/op/pat.t       Sat May 29 21:25:25 2004
--- perl-5.8.7.patched/t/op/pat.t       Thu Nov 15 17:03:23 2007
***************
*** 6,12 ****
  
  $| = 1;
  
! print "1..1065\n";
  
  BEGIN {
      chdir 't' if -d 't';
--- 6,12 ----
  
  $| = 1;
  
! print "1..1067\n";
  
  BEGIN {
      chdir 't' if -d 't';
***************
*** 3256,3261 ****
--- 3256,3270 ----
        );
      }
  }
+  
+ {
+     use warnings;
+     my @w;
+     local $SIG{__WARN__}=sub{push @w,"@_"};
+     my $c=qq(\x{DF}); 
+     ok($c=~/${c}|\x{100}/, "ASCII pattern that really is utf8");
+     ok(@w==0, "No warnings");
+ }    
  
  # perl #25269: panic: pp_match start/end pointers
  ok("a-bc" eq eval {

--zx4FCpZtqtKETZ7O
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="regexp-5.8.4.patch"

diff -rc perl-5.8.4/patchlevel.h perl-5.8.4.patched/patchlevel.h
*** perl-5.8.4/patchlevel.h     Wed Apr 21 20:35:59 2004
--- perl-5.8.4.patched/patchlevel.h     Thu Nov 15 17:08:22 2007
***************
*** 123 ****
!       ,NULL
--- 123,124 ----
!       ,"REGEXP0 - fix for UTF-8 recoding in regexps - CVE-2007-5116"
!       ,NULL
diff -rc perl-5.8.4/regcomp.c perl-5.8.4.patched/regcomp.c
*** perl-5.8.4/regcomp.c        Sat Nov  1 16:00:37 2003
--- perl-5.8.4.patched/regcomp.c        Thu Nov 15 17:08:22 2007
***************
*** 126,132 ****
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
--- 126,135 ----
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;           /* whether the pattern is utf8 or not *
/
!     I32               orig_utf8;      /* whether the pattern was originally i
n utf8 */
!                               /* XXX use this for future optimisation of case
!                                * where pattern must be upgraded to utf8. */
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
***************
*** 152,157 ****
--- 155,161 ----
  #define RExC_seen_zerolen     (pRExC_state->seen_zerolen)
  #define RExC_seen_evals       (pRExC_state->seen_evals)
  #define RExC_utf8     (pRExC_state->utf8)
+ #define RExC_orig_utf8        (pRExC_state->orig_utf8)
  
  #define       ISMULT1(c)      ((c) == '*' || (c) == '+' || (c) == '?')
  #define       ISMULT2(s)      ((*s) == '*' || (*s) == '+' || (*s) == '?' || \
***************
*** 1746,1760 ****
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
-     RExC_precomp = exp;
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), RExC_precomp, PL_colors[1]);
      });
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
--- 1750,1766 ----
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = RExC_orig_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), exp, PL_colors[1]);
      });
+ 
+ redo_first_pass:
+     RExC_precomp = exp;
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
***************
*** 1779,1784 ****
--- 1785,1809 ----
      if (reg(pRExC_state, 0, &flags) == NULL) {
        RExC_precomp = Nullch;
        return(NULL);
+     }
+     if (RExC_utf8 && !RExC_orig_utf8) {
+         /* It's possible to write a regexp in ascii that represents unicode
+         codepoints outside of the byte range, such as via \x{100}. If we
+         detect such a sequence we have to convert the entire pattern to utf8
+         and then recompile, as our sizing calculation will have been based
+         on 1 byte == 1 character, but we will need to use utf8 to encode
+         at least some part of the pattern, and therefore must convert the who
le
+         thing.
+         XXX: somehow figure out how to make this less expensive...
+         -- dmq */
+         STRLEN len = xend-exp;
+         DEBUG_r(PerlIO_printf(Perl_debug_log,
+           "UTF8 mismatch! Converting to utf8 for resizing and compile\n"));
+         exp = (char*)Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len);
+         xend = exp + len;
+         RExC_orig_utf8 = RExC_utf8;
+         SAVEFREEPV(exp);
+         goto redo_first_pass;
      }
      DEBUG_r(PerlIO_printf(Perl_debug_log, "size %"IVdf" ", (IV)RExC_size));
  
diff -rc perl-5.8.4/t/op/pat.t perl-5.8.4.patched/t/op/pat.t
*** perl-5.8.4/t/op/pat.t       Thu Apr  1 17:24:30 2004
--- perl-5.8.4.patched/t/op/pat.t       Thu Nov 15 17:08:47 2007
***************
*** 6,12 ****
  
  $| = 1;
  
! print "1..1056\n";
  
  BEGIN {
      chdir 't' if -d 't';
--- 6,12 ----
  
  $| = 1;
  
! print "1..1058\n";
  
  BEGIN {
      chdir 't' if -d 't';
***************
*** 3255,3260 ****
--- 3255,3269 ----
        );
      }
  }
+  
+ {
+     use warnings;
+     my @w;
+     local $SIG{__WARN__}=sub{push @w,"@_"};
+     my $c=qq(\x{DF}); 
+     ok($c=~/${c}|\x{100}/, "ASCII pattern that really is utf8");
+     ok(@w==0, "No warnings");
+ }    
  
  # perl #25269: panic: pp_match start/end pointers
  ok("a-bc" eq eval {

--zx4FCpZtqtKETZ7O
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="regexp-5.8.3.patch"

diff -rc perl-5.8.3/patchlevel.h perl-5.8.3.patched/patchlevel.h
*** perl-5.8.3/patchlevel.h     Wed Jan 14 15:18:14 2004
--- perl-5.8.3.patched/patchlevel.h     Thu Nov 15 17:13:24 2007
***************
*** 123 ****
!       ,NULL
--- 123,124 ----
!       ,"REGEXP0 - fix for UTF-8 recoding in regexps - CVE-2007-5116"
!       ,NULL
diff -rc perl-5.8.3/regcomp.c perl-5.8.3.patched/regcomp.c
*** perl-5.8.3/regcomp.c        Sat Nov  1 16:00:37 2003
--- perl-5.8.3.patched/regcomp.c        Thu Nov 15 17:13:24 2007
***************
*** 126,132 ****
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
--- 126,135 ----
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;           /* whether the pattern is utf8 or not *
/
!     I32               orig_utf8;      /* whether the pattern was originally i
n utf8 */
!                               /* XXX use this for future optimisation of case
!                                * where pattern must be upgraded to utf8. */
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
***************
*** 152,157 ****
--- 155,161 ----
  #define RExC_seen_zerolen     (pRExC_state->seen_zerolen)
  #define RExC_seen_evals       (pRExC_state->seen_evals)
  #define RExC_utf8     (pRExC_state->utf8)
+ #define RExC_orig_utf8        (pRExC_state->orig_utf8)
  
  #define       ISMULT1(c)      ((c) == '*' || (c) == '+' || (c) == '?')
  #define       ISMULT2(s)      ((*s) == '*' || (*s) == '+' || (*s) == '?' || \
***************
*** 1746,1760 ****
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
-     RExC_precomp = exp;
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), RExC_precomp, PL_colors[1]);
      });
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
--- 1750,1766 ----
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = RExC_orig_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), exp, PL_colors[1]);
      });
+ 
+ redo_first_pass:
+     RExC_precomp = exp;
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
***************
*** 1779,1784 ****
--- 1785,1809 ----
      if (reg(pRExC_state, 0, &flags) == NULL) {
        RExC_precomp = Nullch;
        return(NULL);
+     }
+     if (RExC_utf8 && !RExC_orig_utf8) {
+         /* It's possible to write a regexp in ascii that represents unicode
+         codepoints outside of the byte range, such as via \x{100}. If we
+         detect such a sequence we have to convert the entire pattern to utf8
+         and then recompile, as our sizing calculation will have been based
+         on 1 byte == 1 character, but we will need to use utf8 to encode
+         at least some part of the pattern, and therefore must convert the who
le
+         thing.
+         XXX: somehow figure out how to make this less expensive...
+         -- dmq */
+         STRLEN len = xend-exp;
+         DEBUG_r(PerlIO_printf(Perl_debug_log,
+           "UTF8 mismatch! Converting to utf8 for resizing and compile\n"));
+         exp = (char*)Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len);
+         xend = exp + len;
+         RExC_orig_utf8 = RExC_utf8;
+         SAVEFREEPV(exp);
+         goto redo_first_pass;
      }
      DEBUG_r(PerlIO_printf(Perl_debug_log, "size %"IVdf" ", (IV)RExC_size));
  
diff -rc perl-5.8.3/t/op/pat.t perl-5.8.3.patched/t/op/pat.t
*** perl-5.8.3/t/op/pat.t       Fri Jan  2 00:19:02 2004
--- perl-5.8.3.patched/t/op/pat.t       Thu Nov 15 17:14:32 2007
***************
*** 6,12 ****
  
  $| = 1;
  
! print "1..1055\n";
  
  BEGIN {
      chdir 't' if -d 't';
--- 6,12 ----
  
  $| = 1;
  
! print "1..1057\n";
  
  BEGIN {
      chdir 't' if -d 't';
***************
*** 3254,3259 ****
--- 3254,3268 ----
        );
      }
  }
+  
+ {
+     use warnings;
+     my @w;
+     local $SIG{__WARN__}=sub{push @w,"@_"};
+     my $c=qq(\x{DF}); 
+     ok($c=~/${c}|\x{100}/, "ASCII pattern that really is utf8");
+     ok(@w==0, "No warnings");
+ }    
  
  # last test 1055
  

--zx4FCpZtqtKETZ7O
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="regexp-5.8.0.patch"

diff -rc perl-5.8.0/patchlevel.h perl-5.8.0.patched/patchlevel.h
*** perl-5.8.0/patchlevel.h     Fri Jul 19 00:08:27 2002
--- perl-5.8.0.patched/patchlevel.h     Thu Nov 15 17:18:17 2007
***************
*** 82 ****
!       ,NULL
--- 82,83 ----
!       ,"REGEXP0 - fix for UTF-8 recoding in regexps - CVE-2007-5116"
!       ,NULL
diff -rc perl-5.8.0/regcomp.c perl-5.8.0.patched/regcomp.c
*** perl-5.8.0/regcomp.c        Mon Jul  8 21:10:49 2002
--- perl-5.8.0.patched/regcomp.c        Thu Nov 15 17:18:17 2007
***************
*** 125,131 ****
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
--- 125,134 ----
      I32               extralen;
      I32               seen_zerolen;
      I32               seen_evals;
!     I32               utf8;           /* whether the pattern is utf8 or not *
/
!     I32               orig_utf8;      /* whether the pattern was originally i
n utf8 */
!                               /* XXX use this for future optimisation of case
!                                * where pattern must be upgraded to utf8. */
  #if ADD_TO_REGEXEC
      char      *starttry;              /* -Dr: where regtry was called. */
  #define RExC_starttry (pRExC_state->starttry)
***************
*** 151,156 ****
--- 154,160 ----
  #define RExC_seen_zerolen     (pRExC_state->seen_zerolen)
  #define RExC_seen_evals       (pRExC_state->seen_evals)
  #define RExC_utf8     (pRExC_state->utf8)
+ #define RExC_orig_utf8        (pRExC_state->orig_utf8)
  
  #define       ISMULT1(c)      ((c) == '*' || (c) == '+' || (c) == '?')
  #define       ISMULT2(s)      ((*s) == '*' || (*s) == '+' || (*s) == '?' || \
***************
*** 1737,1751 ****
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
-     RExC_precomp = exp;
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), RExC_precomp, PL_colors[1]);
      });
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
--- 1741,1757 ----
      if (exp == NULL)
        FAIL("NULL regexp argument");
  
!     RExC_utf8 = RExC_orig_utf8 = pm->op_pmdynflags & PMdf_CMP_UTF8;
  
      DEBUG_r({
         if (!PL_colorset) reginitcolors();
         PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
                       PL_colors[4],PL_colors[5],PL_colors[0],
!                      (int)(xend - exp), exp, PL_colors[1]);
      });
+ 
+ redo_first_pass:
+     RExC_precomp = exp;
      RExC_flags = pm->op_pmflags;
      RExC_sawback = 0;
  
***************
*** 1770,1775 ****
--- 1776,1800 ----
      if (reg(pRExC_state, 0, &flags) == NULL) {
        RExC_precomp = Nullch;
        return(NULL);
+     }
+     if (RExC_utf8 && !RExC_orig_utf8) {
+         /* It's possible to write a regexp in ascii that represents unicode
+         codepoints outside of the byte range, such as via \x{100}. If we
+         detect such a sequence we have to convert the entire pattern to utf8
+         and then recompile, as our sizing calculation will have been based
+         on 1 byte == 1 character, but we will need to use utf8 to encode
+         at least some part of the pattern, and therefore must convert the who
le
+         thing.
+         XXX: somehow figure out how to make this less expensive...
+         -- dmq */
+         STRLEN len = xend-exp;
+         DEBUG_r(PerlIO_printf(Perl_debug_log,
+           "UTF8 mismatch! Converting to utf8 for resizing and compile\n"));
+         exp = (char*)Perl_bytes_to_utf8(aTHX_ (U8*)exp, &len);
+         xend = exp + len;
+         RExC_orig_utf8 = RExC_utf8;
+         SAVEFREEPV(exp);
+         goto redo_first_pass;
      }
      DEBUG_r(PerlIO_printf(Perl_debug_log, "size %"IVdf" ", (IV)RExC_size));
  
diff -rc perl-5.8.0/t/op/pat.t perl-5.8.0.patched/t/op/pat.t
*** perl-5.8.0/t/op/pat.t       Mon Jul  1 15:42:19 2002
--- perl-5.8.0.patched/t/op/pat.t       Thu Nov 15 17:19:14 2007
***************
*** 6,12 ****
  
  $| = 1;
  
! print "1..922\n";
  
  BEGIN {
      chdir 't' if -d 't';
--- 6,12 ----
  
  $| = 1;
  
! print "1..924\n";
  
  BEGIN {
      chdir 't' if -d 't';
***************
*** 2900,2904 ****
--- 2900,2913 ----
        }
      }
  }
+  
+ {
+     use warnings;
+     my @w;
+     local $SIG{__WARN__}=sub{push @w,"@_"};
+     my $c=qq(\x{DF}); 
+     ok($c=~/${c}|\x{100}/, "ASCII pattern that really is utf8");
+     ok(@w==0, "No warnings");
+ }    
  
  $test = 923;

--zx4FCpZtqtKETZ7O--


------- End of Forwarded Message

Reply via email to