On Thu, Dec 15, 2005 at 09:56:09PM +0000, Luke Palmer wrote: > On 12/15/05, Brad Bowman <[EMAIL PROTECTED]> wrote: > > Why does the longest input sequence win? > > Is it for some consistency that that I'm not seeing? Some exceedingly > > common use case? The rule seems unnecessarily restrictive. > > Hmm. Good point. You see, the longest token wins because that's an > exceedingly common rule in lexers, and you can't sort regular > expressions the way you can sort strings, so there needs to be special > machinery in there. > > There are two rather weak arguments to keep the longest token rule: > > * We could compile the transliteration into a DFA and make it > fast. Premature optimization. > * We could generalize transliteration to work on rules as well. > > In fact, I think the first Perl module I ever wrote was > Regexp::Subst::Parallel, which did precisely the second of these. > That's one of the easy things that was hard in Perl (but I guess > that's what CPAN is for). Hmm.. none of these is really a compelling > argument either way.
If a shorter rule is allowed to match first, then the longer rule can be removed from the match set, at least for constant string matches. If, for example, '=' can match without preferring to try first for '==' then you'll never match '==' without syntactic help to force a backtracking retry. --