Andrei Alexandrescu, el 18 de febrero a las 21:35 me escribiste:
> I'm almost done rewriting the regular expression engine, and some pretty 
> interesting things have transpired.
> 
> First, I separated the engine into two parts, one that is the actual regular 
> expression engine, and the other that is the state of the match with some 
> particular input. The previous code combined the two into a huge class. The 
> engine (written by Walter) translates the regex string into a 
> bytecode-compiled 
> form. Given that there is a deterministic correspondence between the regex 
> string and the bytecode, the Regex engine object is in fact invariant and 
> cached by 
> the implementation. Caching makes for significant time savings even if e.g. 
> the user repeatedly creates a regular expression engine in a loop.
> 
> In contrast, the match state depends on the input string. I defined it to 
> implement the range interface, so you can either inspect it directly or 
> iterate it 
> for all matches (if the "g" option was passed to the engine).
> 
> The new codebase works with char, wchar, and dchar and any random-access 
> range as input (forward ranges to come, and at some point in the future input 
> ranges 
> as well). In spite of the added flexibility, the code size has shrunk from 
> 3396 lines to 2912 lines. I plan to add support for binary data (e.g. ubyte - 
> handling binary file formats can benefit a LOT from regexes) and also, 
> probably unprecedented, support for arbitrary types such as integers, 
> floating point 
> numbers, structs, what have you. any type that supports comparison and ranges 
> is a good candidate for regular expression matching. I'm not sure how regular 
> expression matching can be harnessed e.g. over arrays of int, but I suspect 
> some pretty cool applications are just around the corner. We can introduce 
> that 
> generalization without adding complexity and there is nothing in principle 
> opposed to it.
> 
> The interface is very simple, mainly consisting of the functions regex(), 
> match(), and sub(), e.g.
> 
> foreach (e; match("abracazoo", regex("a[b-e]", "g")))
>     writeln(e.pre, e.hit, e.post);
> auto s = sub("abracazoo", regex("a([b-e])", "g"), "A$1");

BTW, why are the flags passed as string and not as an integer mask? For
example:
auto s = regex.sub("abracazoo", regex.regex("a([b-e])", regex.G), "A$1");

This way you can catch a few errors at compile-time.


-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
When I was a child I had a fever
My hands felt just like two balloons.
Now I've got that feeling once again
I can't explain you would not understand
This is not how I am.
I have become comfortably numb.

Reply via email to