[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

Sebastian Fischer Wed, 28 Jul 2010 05:39:47 -0700

Have you thought about supporting anchors like (^) and ($) ?

We went the opposite route, made full matching the default, andimplemented partial matching by pre- and appending .*

As there are both a fullMatch and a partialMatch function, I don't seean immediate need for anchors, although I admit that they have theadvantage that you can specify *in the regexp* whether you want fullor partial matching.

You also wrote

I also want to add substring matching,
i.e., the possibility to find out against which strings parenthesized
parts of the regexp were matched.


Getting a good specification for the substring rules is non-trivial.
When writing regex-tdfa (on hackage) I consulted Glenn Fowler's
description at

http://www2.research.att.com/~gsf/testregex/re-interpretation.html


which are well-defined rules for POSIX substring capture.

Thanks for the pointer. I'll consult it when going for submatchextraction.

In the spirit of your DFA I am guessing the "mark" type for substring
capture would actually be an annotated copy of the whole regular
expression tree.

I am not sure whether it is a good idea to implement submatch captureonly by means of a different semiring. I would not hesitate to extendthe matching algorithm itself, if that leads to a clean implementation.

But regex-tdfa is quite old and over-complicated and while it islinear

in complexity it is not especially fast.  I am currently rewriting the
algorithm in OCaml (to learn OCaml) and I am trying to avoid the
/a{2,5}/ blowup that regex-tdfa has.

Did you experience problems with such blowup? The re2 library alsoimplements bounded repetitions by blowing them up, so, at least, itdoesn't seem to be a big problem for Google.


Cheers,
Sebastian

--
Underestimating the novelty of the future is a time-honored tradition.
(D.G.)



_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

Reply via email to