Hi all, Attached is a patch to update irregex to the upstream version of 0.9.6.
When compiling an absurdly nested regex like ($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($($(${-2,16}+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+)+) the engine would consume gigabytes of memory. The reason is that (+ foo) would be rewritten to (seq foo (* foo)), causing the regex to become twice as large. If the nested regex itself also contains +, this happens recursively. Each subexpression will be compiled to a backtracking matcher, building up closures, which eats up even more memory than just the SRE list representation. The fix is to handle + "natively" instead of rewriting it. The patch also refactors * in terms of + (it is simply + with a failure continuation that carries on matching the next expression). The patch also includes two small changes by Sudarshan S. Chawathe, who improved the clarity of the documentation (sre->string generates a PCRE regex pattern, not a POSIX pattern) and fixed a small bug in the sre matcher of sre->string: (sre->string '(seq)) will give an error "(cddr) bad argument type: ()" instead of returning "" as an empty sequence should. Cheers, Peter
From bf5470090dab74496600ea91c6c388d45db354cf Mon Sep 17 00:00:00 2001 From: Peter Bex <pe...@more-magic.net> Date: Wed, 14 Dec 2016 20:25:25 +0100 Subject: [PATCH] Update irregex to upstream 0.9.6 This fixes a resource consumption vulnerability due to exponential memory use based on the depth of nested "+" patterns. --- NEWS | 4 ++++ irregex-core.scm | 32 ++++++++++++++++++-------------- irregex-utils.scm | 2 +- manual/Unit irregex | 2 +- 4 files changed, 24 insertions(+), 16 deletions(-) diff --git a/NEWS b/NEWS index 052cf13..cbadd61 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,9 @@ 4.11.2 +- Security fixes + - Irregex has been updated to 0.9.6, which fixes an exponential + explosion in compilation of nested "+" patterns. + - Compiler: - Fixed incorrect argvector restoration after GC in directly recursive functions (#1317). diff --git a/irregex-core.scm b/irregex-core.scm index 2d6058c..01e027b 100644 --- a/irregex-core.scm +++ b/irregex-core.scm @@ -30,6 +30,8 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;; History +;; 0.9.6: 2016/12/05 - fixed exponential memory use of + in compilation +;; of backtracking matcher. ;; 0.9.5: 2016/09/10 - fixed a bug in irregex-fold handling of bow ;; 0.9.4: 2015/12/14 - performance improvement for {n,m} matches ;; 0.9.3: 2014/07/01 - R7RS library @@ -3170,16 +3172,7 @@ ((sre-empty? (sre-sequence (cdr sre))) (error "invalid sre: empty *" sre)) (else - (letrec - ((body - (lp (sre-sequence (cdr sre)) - n - flags - (lambda (cnk init src str i end matches fail) - (body cnk init src str i end matches - (lambda () - (next cnk init src str i end matches fail) - )))))) + (let ((body (rec (list '+ (sre-sequence (cdr sre)))))) (lambda (cnk init src str i end matches fail) (body cnk init src str i end matches (lambda () @@ -3204,10 +3197,21 @@ (lambda () (body cnk init src str i end matches fail)))))))) ((+) - (lp (sre-sequence (cdr sre)) - n - flags - (rec (list '* (sre-sequence (cdr sre)))))) + (cond + ((sre-empty? (sre-sequence (cdr sre))) + (error "invalid sre: empty +" sre)) + (else + (letrec + ((body + (lp (sre-sequence (cdr sre)) + n + flags + (lambda (cnk init src str i end matches fail) + (body cnk init src str i end matches + (lambda () + (next cnk init src str i end matches fail) + )))))) + body)))) ((=) (rec `(** ,(cadr sre) ,(cadr sre) ,@(cddr sre)))) ((>=) diff --git a/irregex-utils.scm b/irregex-utils.scm index 8332791..a2195a9 100644 --- a/irregex-utils.scm +++ b/irregex-utils.scm @@ -89,7 +89,7 @@ (case (car x) ((: seq) (cond - ((and (pair? (cddr x)) (pair? (cddr x)) (not (eq? x obj))) + ((and (pair? (cdr x)) (pair? (cddr x)) (not (eq? x obj))) (display "(?:" out) (for-each lp (cdr x)) (display ")" out)) (else (for-each lp (cdr x))))) ((submatch) diff --git a/manual/Unit irregex b/manual/Unit irregex index 7805273..7d59f89 100644 --- a/manual/Unit irregex +++ b/manual/Unit irregex @@ -825,7 +825,7 @@ doesn't help when irregex is able to build a DFA. <procedure>(sre->string <sre>)</procedure> -Convert an SRE to a POSIX-style regular expression string, if +Convert an SRE to a PCRE-style regular expression string, if possible. -- 2.1.4
From 44bd98d897ac414f342a0dd3662a77d48b7597f5 Mon Sep 17 00:00:00 2001 From: Peter Bex <pe...@more-magic.net> Date: Wed, 14 Dec 2016 20:28:12 +0100 Subject: [PATCH] Update irregex to upstream 0.9.6 This fixes a resource consumption vulnerability due to exponential memory use based on the depth of nested "+" patterns. --- NEWS | 4 ++++ irregex-core.scm | 32 ++++++++++++++++++-------------- irregex-utils.scm | 2 +- manual/Unit irregex | 2 +- 4 files changed, 24 insertions(+), 16 deletions(-) diff --git a/NEWS b/NEWS index 9a68b2f..2b097ed 100644 --- a/NEWS +++ b/NEWS @@ -60,6 +60,10 @@ 4.11.2 +- Security fixes + - Irregex has been updated to 0.9.6, which fixes an exponential + explosion in compilation of nested "+" patterns. + - Compiler: - Fixed incorrect argvector restoration after GC in directly recursive functions (#1317). diff --git a/irregex-core.scm b/irregex-core.scm index 0fed1f1..931fed1 100644 --- a/irregex-core.scm +++ b/irregex-core.scm @@ -30,6 +30,8 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;; History +;; 0.9.6: 2016/12/05 - fixed exponential memory use of + in compilation +;; of backtracking matcher. ;; 0.9.5: 2016/09/10 - fixed a bug in irregex-fold handling of bow ;; 0.9.4: 2015/12/14 - performance improvement for {n,m} matches ;; 0.9.3: 2014/07/01 - R7RS library @@ -3165,16 +3167,7 @@ ((sre-empty? (sre-sequence (cdr sre))) (error "invalid sre: empty *" sre)) (else - (letrec - ((body - (lp (sre-sequence (cdr sre)) - n - flags - (lambda (cnk init src str i end matches fail) - (body cnk init src str i end matches - (lambda () - (next cnk init src str i end matches fail) - )))))) + (let ((body (rec (list '+ (sre-sequence (cdr sre)))))) (lambda (cnk init src str i end matches fail) (body cnk init src str i end matches (lambda () @@ -3199,10 +3192,21 @@ (lambda () (body cnk init src str i end matches fail)))))))) ((+) - (lp (sre-sequence (cdr sre)) - n - flags - (rec (list '* (sre-sequence (cdr sre)))))) + (cond + ((sre-empty? (sre-sequence (cdr sre))) + (error "invalid sre: empty +" sre)) + (else + (letrec + ((body + (lp (sre-sequence (cdr sre)) + n + flags + (lambda (cnk init src str i end matches fail) + (body cnk init src str i end matches + (lambda () + (next cnk init src str i end matches fail) + )))))) + body)))) ((=) (rec `(** ,(cadr sre) ,(cadr sre) ,@(cddr sre)))) ((>=) diff --git a/irregex-utils.scm b/irregex-utils.scm index 8332791..a2195a9 100644 --- a/irregex-utils.scm +++ b/irregex-utils.scm @@ -89,7 +89,7 @@ (case (car x) ((: seq) (cond - ((and (pair? (cddr x)) (pair? (cddr x)) (not (eq? x obj))) + ((and (pair? (cdr x)) (pair? (cddr x)) (not (eq? x obj))) (display "(?:" out) (for-each lp (cdr x)) (display ")" out)) (else (for-each lp (cdr x))))) ((submatch) diff --git a/manual/Unit irregex b/manual/Unit irregex index 7daff8c..063a918 100644 --- a/manual/Unit irregex +++ b/manual/Unit irregex @@ -825,7 +825,7 @@ doesn't help when irregex is able to build a DFA. <procedure>(sre->string <sre>)</procedure> -Convert an SRE to a POSIX-style regular expression string, if +Convert an SRE to a PCRE-style regular expression string, if possible. -- 2.1.4
signature.asc
Description: Digital signature
_______________________________________________ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers