Hi Chris, "Chris K. Jester-Young" <cky...@gmail.com> skribis:
> I'm currently implementing regexp-split for Guile, which provides a > Perl-style split function (including correctly implementing the "limit" > parameter), minus the special awk-style whitespace handling (that is > used with a pattern of " ", as opposed to / /, with Perl's split). Woow, I don’t understand what you’re saying. :-) > Attached is a couple of patches, to support the regexp-split function > which I'm proposing at the bottom of this message: > > 1. The first fixes the behaviour of fold-matches and list-matches when > the pattern contains a ^ (identical to the patch in my last email). This one was already applied. > 2. The second adds the ability to limit the number of matches done. > This applies on top of the first patch. > > Some comments about the regexp-split implementation: the value that's > being passed to regexp-split-fold is a cons, where the car is the last > match's end position, and the cdr is the substrings so far collected. > > The special check in regexp-split-fold for match-end being zero is to > emulate a specific behaviour as documented for Perl's split: "Empty > leading fields are produced when there are positive-width matches at > the beginning of the string; a zero-width match at the beginning of the > string does not produce an empty field." OK. The semantics of ‘limits’ in ‘regexp-split’ look somewhat awkward to me, but I’ve no better idea, and I understand the rationale and appeal to Perl hackers (yuk!). Stylistic comments: > From 147dc0d7fd9ab04d10b4f13cecf47a32c5b6c4b6 Mon Sep 17 00:00:00 2001 > From: "Chris K. Jester-Young" <cky...@gmail.com> > Date: Mon, 17 Sep 2012 01:06:07 -0400 > Subject: [PATCH 2/2] Add "limit" parameter to fold-matches and list-matches. > > * doc/ref/api-regex.texi: Document new "limit" parameter. > > * module/ice-9/regex.scm (fold-matches, list-matches): Optionally take > a "limit" argument that, if specified, limits how many times the > pattern is matched. > > * test-suite/tests/regexp.test (fold-matches): Add tests for the correct > functioning of the limit parameter. > --- > doc/ref/api-regex.texi | 10 ++++++---- > module/ice-9/regex.scm | 18 ++++++++++-------- > test-suite/tests/regexp.test | 16 +++++++++++++++- > 3 files changed, 31 insertions(+), 13 deletions(-) > > > diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi > index 082fb87..2d2243f 100644 > --- a/doc/ref/api-regex.texi > +++ b/doc/ref/api-regex.texi > @@ -189,11 +189,12 @@ or @code{#f} otherwise. > @end deffn > > @sp 1 > -@deffn {Scheme Procedure} list-matches regexp str [flags] > +@deffn {Scheme Procedure} list-matches regexp str [flags [limit]] > Return a list of match structures which are the non-overlapping > matches of @var{regexp} in @var{str}. @var{regexp} can be either a > pattern string or a compiled regexp. The @var{flags} argument is as > -per @code{regexp-exec} above. > +per @code{regexp-exec} above. The @var{limit} argument, if specified, > +limits how many times @var{regexp} is matched. Rather something non-ambiguous like: Match @var{regexp} at most @var{limit} times, or an indefinite number of times when @var{limit} is omitted or equal to @code{0}. > -@deffn {Scheme Procedure} fold-matches regexp str init proc [flags] > +@deffn {Scheme Procedure} fold-matches regexp str init proc [flags [limit]] > Apply @var{proc} to the non-overlapping matches of @var{regexp} in > @var{str}, to build a result. @var{regexp} can be either a pattern > string or a compiled regexp. The @var{flags} argument is as per > -@code{regexp-exec} above. > +@code{regexp-exec} above. The @var{limit} argument, if specified, > +limits how many times @var{regexp} is matched. Likewise. > @var{proc} is called as @code{(@var{proc} match prev)} where > @var{match} is a match structure and @var{prev} is the previous return > diff --git a/module/ice-9/regex.scm b/module/ice-9/regex.scm > index 08ae2c2..0ffe74c 100644 > --- a/module/ice-9/regex.scm > +++ b/module/ice-9/regex.scm > @@ -167,26 +167,28 @@ > ;;; `b'. Around or within `xxx', only the match covering all three > ;;; x's counts, because the rest are not maximal. > > -(define* (fold-matches regexp string init proc #:optional (flags 0)) > +(define* (fold-matches regexp string init proc #:optional (flags 0) limit) > (let ((regexp (if (regexp? regexp) regexp (make-regexp regexp)))) > (let loop ((start 0) > + (count 0) > (value init) > (abuts #f)) ; True if start abuts a previous > match. > - (define bol (if (zero? start) 0 regexp/notbol)) > - (let ((m (if (> start (string-length string)) #f > - (regexp-exec regexp string start (logior flags bol))))) > + (let* ((bol (if (zero? start) 0 regexp/notbol)) > + (m (and (or (not limit) (< count limit)) > + (<= start (string-length string)) > + (regexp-exec regexp string start (logior flags bol))))) > (cond > ((not m) value) > ((and (= (match:start m) (match:end m)) abuts) > ;; We matched an empty string, but that would overlap the > ;; match immediately before. Try again at a position > ;; further to the right. > - (loop (+ start 1) value #f)) > + (loop (1+ start) count value #f)) > (else > - (loop (match:end m) (proc m value) #t))))))) > + (loop (match:end m) (1+ count) (proc m value) #t))))))) > > -(define* (list-matches regexp string #:optional (flags 0)) > - (reverse! (fold-matches regexp string '() cons flags))) > +(define* (list-matches regexp string #:optional (flags 0) limit) > + (reverse! (fold-matches regexp string '() cons flags limit))) > > (define (regexp-substitute/global port regexp string . items) [...] > + (pass-if "with limit" > + (equal? '("foo" "foo") > + (fold-matches "foo" "foofoofoofoo" '() > + (lambda (match result) > + (cons (match:substring match) > + result)) 0 2)))) Indent like Thien-Thi suggested. Could you send an updated patch? Thanks! Ludo’.