Re: ? quantifier like in regexp

Jeffrey Kegler Wed, 14 May 2014 08:33:28 -0700

The ? quantifier is harder than it may seem, because it pairs a nullingrule and a regular rule, and there are all sort of tricky aspects aboutthis. This is already done in star rules (A ::= B*), but most of thetricky stuff is handled at a low-level -- in the Libmarpa C code. So,to do this, I either hack the inner part of Marpa, or else execute thesame logic twice, once at the Perl level and once at the C level, withall the interaction issues that implies.

It would be straight-forward to do this in a front-end/wrapper to theSLIF, and several people have aired the idea of doing this, but so farnobody has taken it on.


-- jeffrey

On 05/13/2014 07:50 AM, Steven Haryanto wrote:

Greetings Jeffrey and all,
I'm just getting started with Marpa::R2::Scanless. I have a lot ofhigh hopes related to using Marpa:
* (currently in progress) migrate Language::Expr from usingRegexp::Grammar. With RG there are lots of problems: limitations whenwriting grammar (e.g. have to avoid left recursion), exponentialparsing time as the length of input string increases, the wholedebacle of failure to run under Perl 5.18, limited errormessage/diagnostics, reentrace problem (can't use regex matching inaction code), etc.
* migrate Org::Parser from parsing with regex, with the hope ofspeeding up the parsing (and improve the readability of the parsercode :-) ). Parsing my 400KB (8000 lines) todo.org file takes about0.8-1s on my Core i7-4770 PC (and probably a couple of seconds on myCore i5 laptop), I wish the time could go down to at least 0.1-0.2s.
* write a markdown parser and markdown-to-POD converter. The currentMarkdown::POD module uses Markdent which is Moose-based and has aheavy startup cost, about 0.4s on a fast computer and 1+s on a ratherslow one, which is annoying for command-line scripts. It also hastrouble parsing _ (emphasis), causing text like 'some_identifier andanother_identifier' to be converted to POD 'someI<identifier andanother>identifier'.
* rewrite my Ledger::Parser to using Marpa and increase its complianceand feature support.
* write more parsers and converters for other formats which I so farhaven't done because the tools I had at my hand are just Perl regexand Regexp::Grammars.
For now I'm playing and exercising with some simple grammars. I waswondering whether Marpa BNF can (or will) support the "zero-or-one"quantifier ? like commonly found in regexp. This is convenient whenstating a list of things that are optional but need to be in order.For example, consider the case of parsing ISO 8601 date duration (Iapologize in advance for using MarpaX::Simple, it's just a thinwrapper to keep things as simple and as short as possible):
----------------------------
#!/usr/bin/env perl

# parses ISO 8601 duration literal

use 5.010;
use MarpaX::Simple qw(gen_parser);

my $parser = gen_parser(
grammar => <<'_',
:start ::= duration_literal

duration_literal ~ 'P' year_opt month_opt week_opt day_opt
| 'P' year_opt month_opt week_opt day_opt 'T' hour_opt minute_optsecond_opt
year_opt ~ posnum 'Y'
year_opt ~
month_opt ~ posnum 'M'
month_opt ~
week_opt ~ posnum 'W'
week_opt ~
day_opt ~ posnum 'D'
day_opt ~
hour_opt ~ posnum 'H'
hour_opt ~
minute_opt ~ posnum 'M'
minute_opt ~
second_opt ~ posnum 'S'
second_opt ~

posnum ~ digits
    | digits '.' digits
digits ~ [0-9]+
_
);

$parser->('P');
$parser->('P1Y');
$parser->('P2M');
$parser->('P2MT2M');
---------------------
It would be nice if I could write (or can I?) something like this likein a regexp:
---------------------
duration_literal ~ 'P' year? month? week? day?
    | 'P' year? month? week? 'T' hour? minute? second?

year ~ posnum 'Y'
month ~ posnum 'M'
# and so on
---------------------

Expect more (stupid) questions from me :-)

Regards,
Steven
--
You received this message because you are subscribed to the GoogleGroups "marpa parser" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: ? quantifier like in regexp

Reply via email to