I've donea 2nd version of this <http://scsys.co.uk:8002/391796>, which I think should be faster and, especially, take less memory.

This technique is, I hope, of wide interest. To do an "unanchored" search, it uses Marpa's :discard mechanism. Essentially, it treats strings that are not part of the search target as whitespace.

The SLIf is quite compact, but an explanation may help. I set the grammar up to discard all single characters of length 1:
:discard ~ [\d\D]
Discard will always be the last choice. Even in LATM, longest match wins and, with length 1, a discard lexeme can at worst tie for longest match. Whenever there is a non-discard match, that is preferred. Bottom line: discard always loses, unless there is no other choice.

Then, for your search patterns, you define other lexemes. As just shown, when they match they will always be preferred. If you want all matches, you make your top rule
string        ::= target+
where <target> is the pattern you are searching for.

-- jeffrey

On 06/07/2014 10:48 PM, Steven Haryanto wrote:
Hi all,

I wonder if it's feasible to use Marpa, like regular expression, to detect some pattern inside a string. An example of what I'm trying to do is to extract some numeric expression from these strings:

"1+2"
"This is an expression: 1+2, and this is another 1+2+4"
"1+2 is the expression"

I want to recognize and extract 1+2 and 1+2+4 from the above. Here's my current (and failing) attempt:

---
use MarpaX::Simple qw(gen_parser);

my $p = gen_parser(
    grammar => <<'_',
lexeme default  = latm => 1
:default      ::= action=>::first
:start        ::= answer

answer        ::= expr
                | expr any
                | any expr            action=>get1
                | any expr any        action=>get1

expr          ::= num
                | expr '+' expr

num             ~ [\d]+

any             ~ [\d\D]+

:discard        ~ ws
ws              ~ [\s]+
_
    trace_terminals => 1,
    trace_values => 1,
    actions => {
        get0 => sub { $_[1] },
        get1 => sub { $_[2] },
    },
);

sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say "=" x 20 }

check('1');
check('1 + 2');
check('1 + 2 is the expression');
check('This is an expression: 1 + 2 and another 1+2+4');
--

Regards,
Steven
--
You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to