> The regexp in question looks like: ".*some text.*". When used with
> string-match, removing both ".*"s shouldn't affect the matching
> semantically, right?
Differences (off the top of my head):
- match-beginning (as well as the return value of string-match)
- .* only matches chars other than \n
- .* matches greedily, so (string-match "foo" "foofoo") will only match the
first "foo", whereas (string-match ".*foo" "foofoo") will match the whole
string (the .* matches the first "foo").
> If so, would it be wise to add a regexp simplification step that, for
> example, could remove leading and trailing ".*"s in regular expressions
> passed to string-match?
In the code that calls string-match, yes, but within string-match it would
be terribly difficult to figure out whether the optimization is harmless.
> I realise now is not the time to talk about feature additions, but it
> would certainly make my life simpler if any regular expression that *can*
> be compiled to a DFA is.
I don't think you'll find much disagreement here.
> (I also wonder where "regular expressions" that are more than regular
> came from. Just because they are useful?)
No idea. Originally, only the \N backref was non-regular, but PCRE
introduced many others. Note that using a DFA to implement unix
regexp-matching is actually pretty difficult because of the precise
semantics of sub-group matching. So most "early" implementations used
backtracking, which made non-regular extensions "free".
Stefan
_______________________________________________
emacs-pretest-bug mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug