Thanks for the detailed explanation Tobias. Guess I need to brush up on regular expressions. Mortimer
On Thu, Jan 6, 2022 at 9:43 AM Tobias Geerinckx-Rice <[email protected]> wrote: > Hullo Mortimer, > > I hope this answer isn't too basic for you. This input: > > Mortimer Cladwell 写道: > > ---input.txt(2)------- > > foo(abc)bar(def) > > does not match the extended regular expression: > > > "foo([a-z]+)bar(.*)$" > > This would: > > ---input.txt(3)------- > foolishbarista > > result: bazlishista > > I'm not the one to either write or recommend a tutorial on > extended regular expressions, but you'll find plenty on the 'net. > There's also ‘info (grep)Regular Expressions’ which might be good. > These things aren't specific to Guile, although a few dialects > exist, and I think Guile uses the POSIX one. The differences are > quite small. > > In this specific example > > > ("foo([a-z]+)bar(.*)$" all letters end) > > the first string is an extended regular expression. > > It will match a literal ‘foo’ anywhere on a line, followed by 1 or > more lowercase letters, followed by a literal ‘bar’, followed by > anything until the end of the line. > > It will NOT match anything with ‘()’ brackets in it, like your > original input.txt(2). The brackets are regexp syntax used for > grouping and capturing. > > If an optional variable name follows the regexp, it will be set to > the complete match. Here, that is ‘all’, which in our example > will contain "foolishbarista". It's not used here. > > In practice, this variable would be named ‘_’ to indicate that > it's unimportant: > > (("foo([a-z]+)bar(.*)$" _ letters end) > (string-append "baz" letters end)) > > but the author of the manual example thought that ‘all’ would be > more clear. > > Each subsequent optional variable will be set to the content > matched by () groups. Here, ‘letters’ will be set to whatever > matched ‘[a-z]+’, and ‘end’ to whatever matched ‘.*’. > > In our example ‘letters’ is "lish" and ‘end’ is "ista". > > This is powerful, because we can construct arbitrary strings at > run time based that can differ significantly for each line that > matches the same regexp: > > > (string-append "baz" letter end) > > is just Scheme code that uses the captured variables above, > without hard-coding assumptions about what was matched. > > footbarnacles → baztnacles > foodiebarmaid → bazdiemaid > … > > Minutes of fun. > > This special meaning of ‘()’ in extended rexeps means that if you > would want to match: > > ---input.txt(4)------- > fo(bizzle) > > you'd write: > > "fo\\(bizzle\\)" > > Because "\" in a string *also* has special meaning to Guile > itself, we have to write "\\(" if we want the regexp engine to see > "\(". > > > Is the letters/letter in the manual a typo? If I use letter I > > get > > "...unbound variable..." > > Yes, that was a typo, both names should match. I've fixed it. > Thanks for apparently being the first to test this snippet! > > Kind regards, > > T G-R >
