I agree with Christopher's post just about word-for-word, but with one exception, which is this:

Wishing Marpa's audience is something other than it is, is pointless. Marpa has to be presented to the programming profession we have, not the one we'd like, even if the one we'd like is arguably better.

And you might view a regex processor which internally picks the engine for you (that is, Marpa vs. regular expression) as like the change from manual shift to automatic in cars. I avoided buying a manual up to the point where it'd cost me serious $$ to stay manual. But at this point I find the automatic makes fairly reasonable decisions, and I may be better off watching the road than the gearbox. Most of my fellow drivers don't know how to manual shift. Some don't even know what a gearbox does. But are they really worse drivers than the ones I grew up with?

I grew up in a world where BNF was better known than regular expressions. That world is like bell-bottom jeans, with one difference: bell-bottoms might come back.

-- jeffrey

On 06/09/2014 07:10 PM, Christopher Layne wrote:
Personally I think people need to know when not to use a regex and when to 
switch to a proper grammar/parser. I've been in that boat myself and made the 
mistake more than a few times.

When the regex itself is actually more complicated and difficult to understand 
vs the grammar, the value of continuing to use it is gone. At that point it's 
unlikely the particular regex will even outperform a grammar based solution as 
it is. Self-forcing of regex in all cases because it's familiar is pretty 
irrational but probably a consequence of the vast majority of people using them 
not having a lot of experience with proper parsers. At the end of the day, 
we're all writing programmatic scrolls for little state machines as encoded by 
the particular language chosen (regex, grammar, etc) but the language chosen to 
write them should be sane. Why are regexes chosen so often? Familiarity and 
false sense of programmar efficiency. Any non-trivial regex eventually turns 
into a pretty ridiculously large pattern with multiple alternatives that 
wouldn't even be readable without an /x flag. Individuals keep adding more to 
said patterns, all the while just sinking costs into something where they 
should just stop the madness and switch to a classic grammar/parser approach.

 From a technical perspective, when multiple alternative, but valid, patterns show up that require 
stateful logic is when grammars should be considered. The splitting of rules vs tokens as a 
generalized parsing approach is quite clean from an abstraction POV as well. We have rules, they 
define the way something should look and the order of elements that fit into the rules. We have 
tokens, they define what something actually is as coming from a sequence of bits/bytes. In a sense, 
such languages separate "code" from "data" and will always win from a 
maintainability standpoint because the approach is inherently structured, organized, and with less 
baked in data.

It also helps that in most of the non-trivial cases they're usually faster too.

On Jun 9, 2014, at 1050 PT, Jeffrey Kegler <[email protected]> 
wrote:

By the way, another target of opportunity is a regex engine which detects "hard" and 
"easy" regexes.  Most regexes it would handle in the ordinary way, with a regex engine, 
but the hard ones it hands over to Marpa.  This might prove popular because people *want* to do 
everything with a regex.  This would allow them to.  It'd make a great Perl extension.

-- jeffrey

On 06/09/2014 10:03 AM, Steven Haryanto wrote:
Thanks for the answer and explanation. I see that the second approach is about 
50% faster on my PC. Although speed-wise it's not on par with regex for this 
simple case[*], it's interesting nevertheless and will be useful in certain 
cases.

*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x 1000). 
With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) { ... } I get around 250k searches/sec. With the 
Marpa grammars I get +- 200/sec and +- 300/sec.

Regards,
Steven


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to