Re: Finding a parse inside a (potentially long) string?

Jeffrey Kegler Mon, 09 Jun 2014 21:08:19 -0700

I agree with Christopher's post just about word-for-word, but with oneexception, which is this:

Wishing Marpa's audience is something other than it is, is pointless.Marpa has to be presented to the programming profession we have, not theone we'd like, even if the one we'd like is arguably better.

And you might view a regex processor which internally picks the enginefor you (that is, Marpa vs. regular expression) as like the change frommanual shift to automatic in cars. I avoided buying a manual up to thepoint where it'd cost me serious $$ to stay manual. But at this point Ifind the automatic makes fairly reasonable decisions, and I may bebetter off watching the road than the gearbox. Most of my fellowdrivers don't know how to manual shift. Some don't even know what agearbox does. But are they really worse drivers than the ones I grew upwith?

I grew up in a world where BNF was better known than regularexpressions. That world is like bell-bottom jeans, with one difference:bell-bottoms might come back.


-- jeffrey

On 06/09/2014 07:10 PM, Christopher Layne wrote:

Personally I think people need to know when not to use a regex and when to
switch to a proper grammar/parser. I've been in that boat myself and made the
mistake more than a few times.

When the regex itself is actually more complicated and difficult to understand
vs the grammar, the value of continuing to use it is gone. At that point it's
unlikely the particular regex will even outperform a grammar based solution as
it is. Self-forcing of regex in all cases because it's familiar is pretty
irrational but probably a consequence of the vast majority of people using them
not having a lot of experience with proper parsers. At the end of the day,
we're all writing programmatic scrolls for little state machines as encoded by
the particular language chosen (regex, grammar, etc) but the language chosen to
write them should be sane. Why are regexes chosen so often? Familiarity and
false sense of programmar efficiency. Any non-trivial regex eventually turns
into a pretty ridiculously large pattern with multiple alternatives that
wouldn't even be readable without an /x flag. Individuals keep adding more to
said patterns, all the while just sinking costs into something where they
should just stop the madness and switch to a classic grammar/parser approach.

From a technical perspective, when multiple alternative, but valid, patterns show up that require
stateful logic is when grammars should be considered. The splitting of rules vs tokens as a
generalized parsing approach is quite clean from an abstraction POV as well. We have rules, they
define the way something should look and the order of elements that fit into the rules. We have
tokens, they define what something actually is as coming from a sequence of bits/bytes. In a sense,
such languages separate "code" from "data" and will always win from a
maintainability standpoint because the approach is inherently structured, organized, and with less
baked in data.

It also helps that in most of the non-trivial cases they're usually faster too.

On Jun 9, 2014, at 1050 PT, Jeffrey Kegler <[email protected]>
wrote:

By the way, another target of opportunity is a regex engine which detects "hard" and 
"easy" regexes.  Most regexes it would handle in the ordinary way, with a regex engine, 
but the hard ones it hands over to Marpa.  This might prove popular because people *want* to do 
everything with a regex.  This would allow them to.  It'd make a great Perl extension.

-- jeffrey

On 06/09/2014 10:03 AM, Steven Haryanto wrote:

Thanks for the answer and explanation. I see that the second approach is about 
50% faster on my PC. Although speed-wise it's not on par with regex for this 
simple case[*], it's interesting nevertheless and will be useful in certain 
cases.

*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x 1000). 
With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) { ... } I get around 250k searches/sec. With the 
Marpa grammars I get +- 200/sec and +- 300/sec.

Regards,
Steven



--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

Reply via email to