[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-29 Thread Sebastian Fischer
Maybe even write a blog about how it works? A whole blog is probably unnecessary ;) But a blog *post* would be nice! -- Underestimating the novelty of the future is a time-honored tradition. (D.G.) ___ Haskell-Cafe mailing list

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-29 Thread Sebastian Fischer
Hello Chris, thanks for the examples. They show that by adding anchors, the meaning of regular expressions is no longer compositional. For example (^|a) accepts the empty word only if no characters have been read yet. So (^|a) =~ does hold but a(^|a) =~ a does not hold

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-29 Thread Benedikt Huber
Sebastian Fischer schrieb: On Jul 29, 2010, at 12:47 AM, Benedikt Huber wrote: Taking a quick look at the PyPy blog post on JIT code generation for regular expressions, I thought it would be fun to implement a generator using the excellent LLVM bindings for haskell. Interesting. Would you

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread S. Doaitse Swierstra
On 27 jul 2010, at 09:30, Eugene Kirpichov wrote: Perhaps this might mean that we can get incremental and parallel regexp matching by associating each character with a linear operator This is exactly what is happening in the uu-parsinglib. Doaitse (matrix) over this or related semiring,

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Eugene Kirpichov
This is very interesting! Could you provide some more info? T.i. where to look in the source, or on the web? 2010/7/28 S. Doaitse Swierstra doai...@swierstra.net: On 27 jul 2010, at 09:30, Eugene Kirpichov wrote: Perhaps this might mean that we can get incremental and parallel regexp

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Chris Kuklewicz
On 26/07/2010 16:23, Sebastian Fischer wrote: this year's ICFP features A Play on Regular Expressions where two Haskell programmers and an automata theory guru develop an efficient purely functional algorithm for matching regular expressions. That is wonderfully clean way to go straight to a

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Sebastian Fischer
Have you thought about supporting anchors like (^) and ($) ? We went the opposite route, made full matching the default, and implemented partial matching by pre- and appending .* As there are both a fullMatch and a partialMatch function, I don't see an immediate need for anchors, although

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread S. Doaitse Swierstra
On 28 jul 2010, at 13:17, Eugene Kirpichov wrote: This is very interesting! Could you provide some more info? T.i. where to look in the source, or on the web? see: file:///Users/doaitse/.cabal/share/doc/uu-parsinglib-2.4.2/html/index.html The README.hs module contains some further

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Chris Kuklewicz
As there are both a fullMatch and a partialMatch function, I don't see an immediate need for anchors, although I admit that they have the advantage that you can specify *in the regexp* whether you want full or partial matching. The REG_NEWLINE flag for compiling POSIX regular expressions is

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Martijn van Steenbergen
On 7/28/10 14:53, S. Doaitse Swierstra wrote: see: file:///Users/doaitse/.cabal/share/doc/uu-parsinglib-2.4.2/html/index.html Readers might have more luck with the following URLs: http://hackage.haskell.org/package/uu-parsinglib

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Martijn van Steenbergen
On 7/27/10 9:58, Sebastian Fischer wrote: On Jul 27, 2010, at 9:15 AM, Sjoerd Visscher wrote: Oh, by the way, with noMatch, eps, alt and seq_ RegExp is itself a Semiring, Yes, but it's hard to define an Eq instance for arbitrary regular expressions that reflects equivalence of regexps.

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Sebastian Fischer
Oh, by the way, with noMatch, eps, alt and seq_ RegExp is itself a Semiring, Yes, but it's hard to define an Eq instance for arbitrary regular expressions that reflects equivalence of regexps. How hard is this exactly? The standard algorithm to decide regexp equivalence transforms both

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Sebastian Fischer
REG_NEWLINE Compile for newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning in either REs or strings. With this flag, `[^' bracket expressions and `.' never match newline, a `^' anchor matches the null string after any newline in the

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-28 Thread Chris Kuklewicz
Maybe I underestimated the utility of ^ and $. The definition seems intricate. I thought about adding a combinator for matching newline but now think that would lead to wrong start and end positions. For example the start position of the matching substring for ^a in a\na should be 2 not

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-27 Thread Sjoerd Visscher
On Jul 27, 2010, at 7:09 AM, Sebastian Fischer wrote: I find eachOnce :: [RegExp c] - RegExp c eachOnce = foldr alt noMatch . map (foldr seq_ eps) . permutations even clearer but your version is *much* better as it uses nesting to combine all alternatives that start with

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-27 Thread Sjoerd Visscher
On Jul 27, 2010, at 7:09 AM, Sebastian Fischer wrote: I'll add noMatch :: RegExp c noMatch = psym [] (const False) Oh, by the way, with noMatch, eps, alt and seq_ RegExp is itself a Semiring, but I'm not sure what that would do. -- Sjoerd Visscher http://w3future.com

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-27 Thread Eugene Kirpichov
Perhaps this might mean that we can get incremental and parallel regexp matching by associating each character with a linear operator (matrix) over this or related semiring, or something, and mixing that with two sigfpe's articles:

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-27 Thread Sebastian Fischer
On Jul 27, 2010, at 9:15 AM, Sjoerd Visscher wrote: Oh, by the way, with noMatch, eps, alt and seq_ RegExp is itself a Semiring, Yes, but it's hard to define an Eq instance for arbitrary regular expressions that reflects equivalence of regexps. There is currently only `instance Eq

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-26 Thread Sjoerd Visscher
Hi Sebastian, I enjoyed this paper very much. Writing papers in the style of a play seems to work very well! (although I think you should spice it up more if your want to get it on Broadway) It seems that only shift needs the reg field of the RegW datatype. So you can also replace the reg

[Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-26 Thread Sebastian Fischer
Hi Sjoerd, It seems that only shift needs the reg field of the RegW datatype. So you can also replace the reg field with a shift field. This makes the regexp parser extensible, as there is no longer a dependence on the (closed) datatype Reg: data RegW w c = RegW { active :: !Bool,

Re: [Haskell-cafe] Re: ANN: weighted-regexp-0.1.0.0

2010-07-26 Thread Sebastian Fischer
On Jul 27, 2010, at 6:57 AM, Sebastian Fischer wrote: Maybe I'll add it [noMatch] to the next version. I only need a better string representation ;) Ha! It's already provided by character classes: ghci accept (fromString []) abc False I'll add noMatch :: RegExp c noMatch