Re: ? quantifier like in regexp

Jeffrey Kegler Thu, 15 May 2014 07:50:41 -0700

On reflection, I think I will add ? quantification in the Libmarpa Ccode. If it's in the C code, it will be available universally. Also, atthis point, the Perl layer has to support layers of legacy code andchanges of the sort to the Perl level tend to paint me more into a corner.

The bad news about this is that it may take me a while -- it has to comeafter several other changes. A good way to track this (and make sure Idon't forget) is to create a Github issue.


-- jeffrey

On 05/14/2014 08:32 AM, Jeffrey Kegler wrote:

The ? quantifier is harder than it may seem, because it pairs anulling rule and a regular rule, and there are all sort of trickyaspects about this. This is already done in star rules (A ::= B*),but most of the tricky stuff is handled at a low-level -- in theLibmarpa C code. So, to do this, I either hack the inner part ofMarpa, or else execute the same logic twice, once at the Perl leveland once at the C level, with all the interaction issues that implies.
It would be straight-forward to do this in a front-end/wrapper to theSLIF, and several people have aired the idea of doing this, but so farnobody has taken it on.
-- jeffrey

On 05/13/2014 07:50 AM, Steven Haryanto wrote:
Greetings Jeffrey and all,
I'm just getting started with Marpa::R2::Scanless. I have a lot ofhigh hopes related to using Marpa:
* (currently in progress) migrate Language::Expr from usingRegexp::Grammar. With RG there are lots of problems: limitations whenwriting grammar (e.g. have to avoid left recursion), exponentialparsing time as the length of input string increases, the wholedebacle of failure to run under Perl 5.18, limited errormessage/diagnostics, reentrace problem (can't use regex matching inaction code), etc.
* migrate Org::Parser from parsing with regex, with the hope ofspeeding up the parsing (and improve the readability of the parsercode :-) ). Parsing my 400KB (8000 lines) todo.org file takes about0.8-1s on my Core i7-4770 PC (and probably a couple of seconds on myCore i5 laptop), I wish the time could go down to at least 0.1-0.2s.
* write a markdown parser and markdown-to-POD converter. The currentMarkdown::POD module uses Markdent which is Moose-based and has aheavy startup cost, about 0.4s on a fast computer and 1+s on a ratherslow one, which is annoying for command-line scripts. It also hastrouble parsing _ (emphasis), causing text like 'some_identifier andanother_identifier' to be converted to POD 'someI<identifier andanother>identifier'.
* rewrite my Ledger::Parser to using Marpa and increase itscompliance and feature support.
* write more parsers and converters for other formats which I so farhaven't done because the tools I had at my hand are just Perl regexand Regexp::Grammars.
For now I'm playing and exercising with some simple grammars. I waswondering whether Marpa BNF can (or will) support the "zero-or-one"quantifier ? like commonly found in regexp. This is convenient whenstating a list of things that are optional but need to be in order.For example, consider the case of parsing ISO 8601 date duration (Iapologize in advance for using MarpaX::Simple, it's just a thinwrapper to keep things as simple and as short as possible):
----------------------------
#!/usr/bin/env perl

# parses ISO 8601 duration literal

use 5.010;
use MarpaX::Simple qw(gen_parser);

my $parser = gen_parser(
grammar => <<'_',
:start ::= duration_literal

duration_literal ~ 'P' year_opt month_opt week_opt day_opt
| 'P' year_opt month_opt week_opt day_opt 'T' hour_opt minute_optsecond_opt
year_opt ~ posnum 'Y'
year_opt ~
month_opt ~ posnum 'M'
month_opt ~
week_opt ~ posnum 'W'
week_opt ~
day_opt ~ posnum 'D'
day_opt ~
hour_opt ~ posnum 'H'
hour_opt ~
minute_opt ~ posnum 'M'
minute_opt ~
second_opt ~ posnum 'S'
second_opt ~

posnum ~ digits
    | digits '.' digits
digits ~ [0-9]+
_
);

$parser->('P');
$parser->('P1Y');
$parser->('P2M');
$parser->('P2MT2M');
---------------------
It would be nice if I could write (or can I?) something like thislike in a regexp:
---------------------
duration_literal ~ 'P' year? month? week? day?
    | 'P' year? month? week? 'T' hour? minute? second?

year ~ posnum 'Y'
month ~ posnum 'M'
# and so on
---------------------

Expect more (stupid) questions from me :-)

Regards,
Steven
--
You received this message because you are subscribed to the GoogleGroups "marpa parser" group.To unsubscribe from this group and stop receiving emails from it,send an email to [email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: ? quantifier like in regexp

Reply via email to