Inspired by an idea by Andrew Pang and an old project of mine, I decided to fill out a reusable regular expression library which is similar to Perl's, but much more expressive.
It provides regular and monadic versions, a very overloaded and useful interface, as well as extensibility. although currently the only instance is based on Text.Regex, it generalizes to matching lists of arbitrary type, not just strings, and also leaves the door open for compile-time checked and optimized regular expressions via template Haskell. so, does this seem interesting? I am really enjoying the =~ syntax already in projects, and the monadic version is great for concise ad-hoc parsers. my next steps are to 1. finish my template Haskell regex compiler, not only to optimize at compile time, but allow matching of arbitrary types. not just strings. (perhaps reusing some code from the old Regular Expressions in Haskell project) 2. make a pcre binding. hook it in. and possible ideas for the future are * remove restriction to lists, so tree-like structures can be matched. perhaps grabbing default value from Monoid. The only reason I don't have this now is I didn't want to enable overlapping-instances, things are already complicated enough :) * add further optimized versions of match*, perhaps split up class for when there is no clear concept of the preceding part and postceeding parts around a match.. * develop a substitution syntax * implement pcre in pure Haskell although, even now, it seems quite useful. new versions will appear at http://repetae.net/john/computer/haskell/RegexSyntax.hs plus, this is something to brag about to Perl people, they can only do things differently based on scalar vs. list context. our =~ does 10 different things typesafely and probably some more I missed :) {-# OPTIONS -fglasgow-exts #-} module RegexSyntax(RegexLike(..), RegexLikeImp(..), RegexContext(..), (!~)) where import Array import Text.Regex import Maybe {- basic usage: > string =~ "regular expression" returns different things depending on context type - what it evaluates to --------------------------- Int - number of times the regular expression matches String - matching portion of string (String,String,String) - (text before match, matching text, text after match) [Either String String] - list of matching and nonmatching strings, if concated, the original string results. Left = notmatching, Right = matching. Bool - whether the string matches () - always returns () (useful in monad context, see below) [String] - list of matches Array Int String - list of substring matches for first match (String, Array Int String) - full matching text and substring matches [(String, Array Int String)] - all matches, full match plus substrings [Array Int String] - all substrings from all matches also, there is the monadic version (=~~) which always behaves exactly the same as (=~) except when the match fails, instead of returning a default value, the monad fails. s !~ re = not (s =~ re) for convinience regular expressions: these may be strings, which are interpreted as regular expressions, or Regex's from the Text.Regex module. or any other instance of the RegexLike class. when using strings, you may prefix the regex by "i/" for a case-insensitive match and "s/" to treat the string as a single line. (or both as "si/") A leading "/" is ignored, other than these cases "/" is not special. advanced features: not just strings can be matched, but rather lists of anything a matcher is defined for. RegexLikeImp data class can be used for in-place code generated by template haskell for compile-time checked regular expresions -} class RegexLike r a | r -> a where matchOnce :: r -> [a] -> Maybe ([a],[a],[a],Array Int [a]) matchTest :: r -> [a] -> Bool matchAll :: r -> [a] -> [Either [a] ([a],Array Int [a])] matchShow :: r -> String -- for error messages matchTest r xs = isJust (matchOnce r xs) matchAll r xs = case matchOnce r xs of Nothing -> pn xs [] Just (p,m,rest,as) -> pn p (Right (m,as):matchAll r rest) where pn x = if null x then id else (Left x:) matchShow _ = "Unknown" instance RegexLike Regex Char where matchOnce re xs = fmap f (matchRegexAll re xs) where f (x,y,z,ls) = (x,y,z,listArray (1,length ls) ls) matchShow _ = "Regex" instance RegexLike String Char where matchOnce re xs = fmap f (matchRegexAll (mr re) xs) where f (x,y,z,ls) = (x,y,z,listArray (1,length ls) ls) mr ('i':'/':re) = mkRegexWithOpts re True False mr ('s':'/':re) = mkRegexWithOpts re False True mr ('i':'s':'/':re) = mkRegexWithOpts re False False mr ('s':'i':'/':re) = mkRegexWithOpts re False False mr ('/':re) = mkRegex re mr (re) = mkRegex re matchShow re = re class RegexContext x a where (=~) :: RegexLike r x => [x] -> r -> a (=~~) :: (Monad m, RegexLike r x) => [x] -> r -> m a -- s =~~ re = return (s =~~ re) not default because probably not what you want s !~ re = not (s =~ re) regexFailed re = fail $ "regex failed to match: " ++ matchShow re instance RegexContext x Int where s =~ re = let xs = matchAll re s in length [x | Right x <- xs] s =~~ re = case (s =~ re) of 0 -> regexFailed re xs -> return $ xs instance RegexContext x ([x],[x],[x]) where s =~ re = case matchOnce re s of Nothing -> (s,[],[]) Just (x,y,z,_) -> (x,y,z) s =~~ re = case matchOnce re s of Nothing -> regexFailed re Just (x,y,z,_) -> return (x,y,z) instance RegexContext x [Either [x] [x]] where s =~ re = map f $ matchAll re s where f (Left s) = Left s f (Right (x,_)) = Right x s =~~ re = case (s =~ re) of [Left _] -> regexFailed re xs -> return $ xs instance RegexContext x [x] where s =~ re = case matchOnce re s of Nothing -> [] `asTypeOf` s Just (_,s,_,_) -> s s =~~ re = case matchOnce re s of Nothing -> regexFailed re Just (_,s,_,_) -> return s -- useful in non-monad context instance RegexContext x Bool where s =~ re = matchTest re s s =~~ re = case s =~ re of False -> regexFailed re True -> return True -- useful in monad context instance RegexContext x () where s =~ re = () s =~~ re = case s =~ re of False -> regexFailed re True -> return () instance RegexContext x [[x]] where s =~ re = [x | Right (x,_) <- matchAll re s] s =~~ re = case (s =~ re) of [] -> regexFailed re xs -> return xs instance RegexContext x [([x],Array Int [x])] where s =~ re = [x | Right x <- matchAll re s] s =~~ re = case (s =~ re) of [] -> regexFailed re xs -> return xs instance RegexContext x [Array Int [x]] where s =~ re = [x | Right (_,x) <- matchAll re s] s =~~ re = case (s =~ re) of [] -> regexFailed re xs -> return xs instance RegexContext x (Array Int [x]) where s =~ re = case matchOnce re s of Nothing -> listArray (1,0) [] Just (_,_,_,z) -> z s =~~ re = case matchOnce re s of Nothing -> regexFailed re Just (_,_,_,z) -> return z -- this is used for template haskell to generate compile-time parsed regular -- expressions data RegexLikeImp a = RegexLikeImp { reImpMatchOnce :: [a] -> Maybe ([a],[a],[a],Array Int [a]), reImpMatchTest :: [a] -> Bool, reImpMatchAll :: [a] -> [Either [a] ([a],Array Int [a])], reImpMatchShow :: String -- for error messages } instance RegexLike (RegexLikeImp a) a where matchOnce = reImpMatchOnce matchTest = reImpMatchTest matchAll = reImpMatchAll matchShow = reImpMatchShow -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - [EMAIL PROTECTED] -------------------------------------------------------------------------- _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell