Hi Sjoerd,
It seems that only shift needs the reg field of the RegW datatype.
So you can also replace the reg field with a shift field. This makes
the regexp parser extensible, as there is no longer a dependence on
the (closed) datatype Reg:
data RegW w c = RegW { active :: !Bool,
empty :: !w,
final_ :: !w,
shift :: w -> c -> RegW w c }
Interesting observation. However, such an encoding would prevent the
definition of some other functions on RegExp. More specifically, there
are Show and Eq instances for the QuickCheck tests.
For example it is then easy to define the parser that matches
nothing, which is the identity element of alt:
noMatch :: RegExp c
noMatch = RegExp noMatchW
noMatchW :: Semiring w => RegW w c
noMatchW = RegW False zero zero $ \_ _ -> noMatchW
Note that you can also define it with the current interface:
noMatch :: RegExp c
noMatch = psym "(Big Lambda)" (const False)
Maybe I'll add it to the next version. I only need a better string
representation ;)
But otherwise I do wonder if the parser needs to be extensible.
I have some ideas for extending the matcher. For example /a{2,5}/ is
currently translated into /aa(a(a(a)?)?)?/ but it may be possible to
handle it without such blowup. I also want to add substring matching,
i.e., the possibility to find out against which strings parenthesized
parts of the regexp were matched.
But as the closed Reg type is not exported I can freely change it
along with any matcher extension.
For example some XML Schema implementations that are based on finite
automata have special cases for the xs:all construct, which matches
a list of elements, each occurring once in any order. But I tried a
straightforward implementation and it works fine:
eachOnce :: [RegExp c] -> RegExp c
eachOnce [] = eps
eachOnce ps = eachOnce' ps [] where
eachOnce' [] _ = noMatch
eachOnce' (p:ps) qs = (p `seq_` eachOnce (ps ++ qs)) `alt`
eachOnce' ps (p:qs)
Neat! That's also worth adding. I find
eachOnce :: [RegExp c] -> RegExp c
eachOnce = foldr alt noMatch . map (foldr seq_ eps) . permutations
even clearer but your version is *much* better as it uses nesting to
combine all alternatives that start with the same regexp.
Thanks!
Sebastian
--
Underestimating the novelty of the future is a time-honored tradition.
(D.G.)
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe