Re: Resolving ambiguity in structure rules

Jeffrey Kegler Fri, 09 May 2014 12:39:27 -0700

There are a few different ways to go. One is to move the decisionsabout information/non-information comments up into the G1 layer, whichis a lot better at resolving ambiguities. The lexer is faster, but itis commited to a "greedy" approach, which is not convenient if your"last resort" is longer than your preferred alternatives.

A second strategy is to use events to deal with all comments using Perlprocessing. If the issue of comment formats is essentially open-ended(which it might be if you are dealing with real-life data that does notnecessarily obey rules), this may be the best way.

It seems you can assume any comment that does not contain an '=' or asecond '#' can be tossed, so you could catch these in the lexer, anddiscard them there. That will save the Perl processing overhead forthose cases, which I am guessing will be the majority.

There are some other tricks that may work, depending on what propertiesyou can rely on. Do all these comments always run to the end of theline? If so, you can use lexeme priorities -- that works so long as youare sure you are comparing longest matches. So for example, matchingsomething like "#HOT#", you look for


  TagStr          ::= '#' TagList '#' comment_chars [\n]

where TagList gets moved down into the lexer -- but this means TagListmust go down into the lexer and you lose the ability to pull the tagsapart using the Marpa grammar, and will have to do it in callback orpost-processing.

If these comments do not suggest a "right" answer to you, you could addto the test set, which will make it easier to experiment and know if asolution actually does what you need,


-- jeffrey

On 05/09/2014 12:09 PM, [email protected] wrote:

You have the right idea. Unfortunately, I do not get to dictate thesyntax of this file I get to parse and there is considerable ambiguityin comments. There are essentially three forms of a comment. Twoforms of this comment include information I need to parse. One form(non-information comment) does not contain useful information.
1) embedded base number --> Matches OptEmbeddedBase --> Actualinformation I need. Discernable from a non-information comment byit's location immediately after the opening of a pattern list braceand that if must contain '#base=<list>', where <list> is a commadelimited list of integers.
2) tag string --> Matches TagStr --> Again, information I need.Discernable from a non-information comment by location after a patterndeclaration and by the fact that it is bookended by '#' symbols cancan only contain a comma delimited list of word (\w) characters.Technically, whitespace is not allowed inside these strings either. Ifigured I'd sort that out once I had it matching as is.
3) Non information comment -> Matches COMMENT --> Can be discarded.This is any comment that does not match one of the first two forms.
Hopefully that's helpful. When you say that you'd 'simply say that inthe grammar', I'm confused. Is this not what I'm saying in thegrammar in the TagStr rule by setting '#' characters before and afterthe TagList rule? Is there a better way to resolve this ambiguity?
On Friday, May 9, 2014 11:46:16 AM UTC-7, Jeffrey Kegler wrote:

    Trying to get the idea, is it that tags use '#' as a delimiter,
    much in
    the same way that strings use quotes?  And that's it's a comment if
    there's a '#' that is not matched before the newline?  That is,
    that in

         Pat n2000000g0000002; #HOT# # Not so hot

    "#HOT#" is a tag, and "# Not so hot" is a comment?

    If that's the case, I'd simply say that in the grammar.  I'd give
    more
    detail, but I'm not 100% clear on the intent at this point.

    -- jeffrey

--
You received this message because you are subscribed to the GoogleGroups "marpa parser" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Resolving ambiguity in structure rules

Reply via email to