Re: Finding a parse inside a (potentially long) string?

Jeffrey Kegler Mon, 09 Jun 2014 10:36:06 -0700

Where the search target *is* a regular expression, Marpa will never becompetitive with regexes. But regexes get used for a lot of thingswhich are NOT regular expressions, and on these Marpa can and does win.

I've used matching parentheses as an example. These are not regularexpressions, but regexes get used for them anyway. And in "easy" cases,regexes still win. But in "hard" cases Marpa is 10x faster or more. Idid a detailed write-up on my blog twice: the 2nd version is here<http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2012/08/marpa-v-perl-regexes-a-rematch.html>.Basically, the story is that when the regex has to do back-tracking,Marpa wins. Marpa does all its parsing without back-tracking.

Interesting applications for Marpa pattern searching might be thingslike finding unmatched parens, brackets, etc. in a programming language,taking into account strings, comments, etc. You can't do that with apure regular expression and a regex will be unreadable and slow.

You can think of it as a tortoise and hare thing. Marpa's a good steadypredictable tortoise, and it will win if the course is difficult. Butfor a simple regular expression, pick the hare.


-- jeffrey

On 06/09/2014 10:03 AM, Steven Haryanto wrote:

Thanks for the answer and explanation. I see that the second approachis about 50% faster on my PC. Although speed-wise it's not on par withregex for this simple case[*], it's interesting nevertheless and willbe useful in certain cases.

*) Did a simple benchmark for string: ("a" x 1000) . " 1+2 " . ("a" x1000). With regex search: while ($input =~ /(\d+(\s*\+\s*\d+)*)/g) {... } I get around 250k searches/sec. With the Marpa grammars I get +-200/sec and +- 300/sec.


Regards,
Steven


Pada Minggu, 08 Juni 2014 23:24:21 UTC+7, Jeffrey Kegler menulis:

    I've donea 2nd version of this <http://scsys.co.uk:8002/391796>,
    which I think should be faster and, especially, take less memory.

    This technique is, I hope, of wide interest.  To do an
    "unanchored" search, it uses Marpa's :discard mechanism.
    Essentially, it treats strings that are not part of the search
    target as whitespace.

    The SLIf is quite compact, but an explanation may help.  I set the
    grammar up to discard all single characters of length 1:

    :discard ~ [\d\D]

    Discard will always be the last choice.  Even in LATM, longest
    match wins and, with length 1, a discard lexeme can at worst tie
    for longest match.  Whenever there is a non-discard match, that is
    preferred.  Bottom line: discard always loses, unless there is no
    other choice.

    Then, for your search patterns, you define other lexemes. As just
    shown, when they match they will always be preferred.  If you want
    all matches, you make your top rule

    string        ::= target+

    where <target> is the pattern you are searching for.

    -- jeffrey

    On 06/07/2014 10:48 PM, Steven Haryanto wrote:

    Hi all,

    I wonder if it's feasible to use Marpa, like regular expression,
    to detect some pattern inside a string. An example of what I'm
    trying to do is to extract some numeric expression from these
    strings:

    "1+2"
    "This is an expression: 1+2, and this is another 1+2+4"
    "1+2 is the expression"

    I want to recognize and extract 1+2 and 1+2+4 from the above.
    Here's my current (and failing) attempt:

    ---
    use MarpaX::Simple qw(gen_parser);

    my $p = gen_parser(
        grammar => <<'_',
    lexeme default  = latm => 1
    :default      ::= action=>::first
    :start        ::= answer

    answer        ::= expr
                    | expr any
                    | any expr  action=>get1
                    | any expr any  action=>get1

    expr          ::= num
                    | expr '+' expr

    num             ~ [\d]+

    any             ~ [\d\D]+

    :discard        ~ ws
    ws              ~ [\s]+
    _
        trace_terminals => 1,
        trace_values => 1,
        actions => {
            get0 => sub { $_[1] },
            get1 => sub { $_[2] },
        },
    );

    sub check { say "Input: $_[0]"; say "Output: ", $p->($_[0]); say
    "=" x 20 }

    check('1');
    check('1 + 2');
    check('1 + 2 is the expression');
    check('This is an expression: 1 + 2 and another 1+2+4');
    --

    Regards,
    Steven

--You received this message because you are subscribed to the

    Google Groups "marpa parser" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the GoogleGroups "marpa parser" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Finding a parse inside a (potentially long) string?

Reply via email to