Hi all,

Attached is a patch to update irregex to the upstream version of 0.9.6.

When compiling an absurdly nested regex like 
($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($(${-2,16}+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)
the engine would consume gigabytes of memory.

The reason is that (+ foo) would be rewritten to (seq foo (* foo)),
causing the regex to become twice as large.  If the nested regex itself
also contains +, this happens recursively.  Each subexpression
will be compiled to a backtracking matcher, building up closures, which
eats up even more memory than just the SRE list representation.

The fix is to handle + "natively" instead of rewriting it.  The patch
also refactors * in terms of + (it is simply + with a failure
continuation that carries on matching the next expression).

The patch also includes two small changes by Sudarshan S. Chawathe,
who improved the clarity of the documentation (sre->string generates
a PCRE regex pattern, not a POSIX pattern) and fixed a small bug in
the sre matcher of sre->string: (sre->string '(seq)) will give an
error "(cddr) bad argument type: ()" instead of returning "" as an
empty sequence should.

Cheers,
Peter
From bf5470090dab74496600ea91c6c388d45db354cf Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Wed, 14 Dec 2016 20:25:25 +0100
Subject: [PATCH] Update irregex to upstream 0.9.6

This fixes a resource consumption vulnerability due to exponential
memory use based on the depth of nested "+" patterns.
---
 NEWS                |  4 ++++
 irregex-core.scm    | 32 ++++++++++++++++++--------------
 irregex-utils.scm   |  2 +-
 manual/Unit irregex |  2 +-
 4 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/NEWS b/NEWS
index 052cf13..cbadd61 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,9 @@
 4.11.2
 
+- Security fixes
+  - Irregex has been updated to 0.9.6, which fixes an exponential
+    explosion in compilation of nested "+" patterns.
+
 - Compiler:
   - Fixed incorrect argvector restoration after GC in directly
     recursive functions (#1317).
diff --git a/irregex-core.scm b/irregex-core.scm
index 2d6058c..01e027b 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -30,6 +30,8 @@
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;; History
+;; 0.9.6: 2016/12/05 - fixed exponential memory use of + in compilation
+;;                     of backtracking matcher.
 ;; 0.9.5: 2016/09/10 - fixed a bug in irregex-fold handling of bow
 ;; 0.9.4: 2015/12/14 - performance improvement for {n,m} matches
 ;; 0.9.3: 2014/07/01 - R7RS library
@@ -3170,16 +3172,7 @@
               ((sre-empty? (sre-sequence (cdr sre)))
                (error "invalid sre: empty *" sre))
               (else
-               (letrec
-                   ((body
-                     (lp (sre-sequence (cdr sre))
-                         n
-                         flags
-                         (lambda (cnk init src str i end matches fail)
-                           (body cnk init src str i end matches
-                                 (lambda ()
-                                   (next cnk init src str i end matches fail)
-                                   ))))))
+               (let ((body (rec (list '+ (sre-sequence (cdr sre))))))
                  (lambda (cnk init src str i end matches fail)
                    (body cnk init src str i end matches
                          (lambda ()
@@ -3204,10 +3197,21 @@
                          (lambda ()
                            (body cnk init src str i end matches fail))))))))
             ((+)
-             (lp (sre-sequence (cdr sre))
-                 n
-                 flags
-                 (rec (list '* (sre-sequence (cdr sre))))))
+             (cond
+              ((sre-empty? (sre-sequence (cdr sre)))
+               (error "invalid sre: empty +" sre))
+              (else
+               (letrec
+                   ((body
+                     (lp (sre-sequence (cdr sre))
+                         n
+                         flags
+                         (lambda (cnk init src str i end matches fail)
+                           (body cnk init src str i end matches
+                                 (lambda ()
+                                   (next cnk init src str i end matches fail)
+                                   ))))))
+                 body))))
             ((=)
              (rec `(** ,(cadr sre) ,(cadr sre) ,@(cddr sre))))
             ((>=)
diff --git a/irregex-utils.scm b/irregex-utils.scm
index 8332791..a2195a9 100644
--- a/irregex-utils.scm
+++ b/irregex-utils.scm
@@ -89,7 +89,7 @@
         (case (car x)
           ((: seq)
            (cond
-            ((and (pair? (cddr x)) (pair? (cddr x)) (not (eq? x obj)))
+            ((and (pair? (cdr x)) (pair? (cddr x)) (not (eq? x obj)))
              (display "(?:" out) (for-each lp (cdr x)) (display ")" out))
             (else (for-each lp (cdr x)))))
           ((submatch)
diff --git a/manual/Unit irregex b/manual/Unit irregex
index 7805273..7d59f89 100644
--- a/manual/Unit irregex	
+++ b/manual/Unit irregex	
@@ -825,7 +825,7 @@ doesn't help when irregex is able to build a DFA.
 
 <procedure>(sre->string <sre>)</procedure>
 
-Convert an SRE to a POSIX-style regular expression string, if
+Convert an SRE to a PCRE-style regular expression string, if
 possible.
 
 
-- 
2.1.4

From 44bd98d897ac414f342a0dd3662a77d48b7597f5 Mon Sep 17 00:00:00 2001
From: Peter Bex <pe...@more-magic.net>
Date: Wed, 14 Dec 2016 20:28:12 +0100
Subject: [PATCH] Update irregex to upstream 0.9.6

This fixes a resource consumption vulnerability due to exponential
memory use based on the depth of nested "+" patterns.
---
 NEWS                |  4 ++++
 irregex-core.scm    | 32 ++++++++++++++++++--------------
 irregex-utils.scm   |  2 +-
 manual/Unit irregex |  2 +-
 4 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/NEWS b/NEWS
index 9a68b2f..2b097ed 100644
--- a/NEWS
+++ b/NEWS
@@ -60,6 +60,10 @@
 
 4.11.2
 
+- Security fixes
+  - Irregex has been updated to 0.9.6, which fixes an exponential
+    explosion in compilation of nested "+" patterns.
+
 - Compiler:
   - Fixed incorrect argvector restoration after GC in directly
     recursive functions (#1317).
diff --git a/irregex-core.scm b/irregex-core.scm
index 0fed1f1..931fed1 100644
--- a/irregex-core.scm
+++ b/irregex-core.scm
@@ -30,6 +30,8 @@
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;; History
+;; 0.9.6: 2016/12/05 - fixed exponential memory use of + in compilation
+;;                     of backtracking matcher.
 ;; 0.9.5: 2016/09/10 - fixed a bug in irregex-fold handling of bow
 ;; 0.9.4: 2015/12/14 - performance improvement for {n,m} matches
 ;; 0.9.3: 2014/07/01 - R7RS library
@@ -3165,16 +3167,7 @@
               ((sre-empty? (sre-sequence (cdr sre)))
                (error "invalid sre: empty *" sre))
               (else
-               (letrec
-                   ((body
-                     (lp (sre-sequence (cdr sre))
-                         n
-                         flags
-                         (lambda (cnk init src str i end matches fail)
-                           (body cnk init src str i end matches
-                                 (lambda ()
-                                   (next cnk init src str i end matches fail)
-                                   ))))))
+               (let ((body (rec (list '+ (sre-sequence (cdr sre))))))
                  (lambda (cnk init src str i end matches fail)
                    (body cnk init src str i end matches
                          (lambda ()
@@ -3199,10 +3192,21 @@
                          (lambda ()
                            (body cnk init src str i end matches fail))))))))
             ((+)
-             (lp (sre-sequence (cdr sre))
-                 n
-                 flags
-                 (rec (list '* (sre-sequence (cdr sre))))))
+             (cond
+              ((sre-empty? (sre-sequence (cdr sre)))
+               (error "invalid sre: empty +" sre))
+              (else
+               (letrec
+                   ((body
+                     (lp (sre-sequence (cdr sre))
+                         n
+                         flags
+                         (lambda (cnk init src str i end matches fail)
+                           (body cnk init src str i end matches
+                                 (lambda ()
+                                   (next cnk init src str i end matches fail)
+                                   ))))))
+                 body))))
             ((=)
              (rec `(** ,(cadr sre) ,(cadr sre) ,@(cddr sre))))
             ((>=)
diff --git a/irregex-utils.scm b/irregex-utils.scm
index 8332791..a2195a9 100644
--- a/irregex-utils.scm
+++ b/irregex-utils.scm
@@ -89,7 +89,7 @@
         (case (car x)
           ((: seq)
            (cond
-            ((and (pair? (cddr x)) (pair? (cddr x)) (not (eq? x obj)))
+            ((and (pair? (cdr x)) (pair? (cddr x)) (not (eq? x obj)))
              (display "(?:" out) (for-each lp (cdr x)) (display ")" out))
             (else (for-each lp (cdr x)))))
           ((submatch)
diff --git a/manual/Unit irregex b/manual/Unit irregex
index 7daff8c..063a918 100644
--- a/manual/Unit irregex	
+++ b/manual/Unit irregex	
@@ -825,7 +825,7 @@ doesn't help when irregex is able to build a DFA.
 
 <procedure>(sre->string <sre>)</procedure>
 
-Convert an SRE to a POSIX-style regular expression string, if
+Convert an SRE to a PCRE-style regular expression string, if
 possible.
 
 
-- 
2.1.4

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Chicken-hackers mailing list
Chicken-hackers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-hackers

Reply via email to