Thomas Hartman wrote:

testPcre = ( subRegex (mkRegex "(?<!\n)\n(?!\n)") "asdf\n \n\n\nadsf"
"" ) == "asdf \n\n\nadsf"

quoting from the man page for regcomp:

REG_NEWLINE   Compile for newline-sensitive matching.  By default, newline is a 
completely ordinary character with
              no special meaning in either REs or strings.  With this flag, 
`[^' bracket expressions and `.' never
              match newline, a `^' anchor matches the null string after any 
newline in the string in addition to
              its normal function, and the `$' anchor matches the null string 
before any newline in the string in
              addition to its normal function.

This is the carried over to Text.Regex with

mkRegexWithOpts Source
:: String       The regular expression to compile
-> Bool      True <=> '^' and '$' match the beginning and end of individual 
lines respectively, and '.' does not match the newline character.
-> Bool      True <=> matching is case-sensitive
-> Regex     Returns: the compiled regular expression
Makes a regular expression, where the multi-line and case-sensitive options can 
be changed from the default settings.

Or with regex-posix directly the flag is "compNewline":
http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text-Regex-Posix-Wrap.html
> The defaultCompOpt is (compExtended .|. compNewline).

You want to match a \n that is not next to any other \n.

So you want to turn off REG_NEWLINE.

import Text.Regex.Compat

r :: Regex
r = mkRegexWithOpts "(^|[^\n])\n($|[^\n])" False True  -- False is important 
here


The ^ and $ take care of matching a lone newline at the start or end of the whole text. In the middle of the text the pattern is equivalent to [^\n]\n[^\n].

When substituting you can use the \1 and \2 captures to restore the matched non-newline character if one was present.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to