As a matter of fact, I cannot. I have seen actual cases of '#' characters
in comments. They've confused some other code unrelated to mine as well.
I had hoped the strict format of a tag string or the embedded base number
string would be sufficient to differentiate them from the random comments.
It's appearing that's not really the case. Perhaps my only real choice is
to pause when I find the initial '#' character and parse the comment
myself. Telling a tag string or an embedded base number from a comment is
trivial but I don't know how you determine context in that case as both are
location sensitive. As a matter of fact, a real comment embedded in one of
the files looks like this:
###### END OF CHECKERBOARD #####
On Friday, May 9, 2014 1:41:14 PM UTC-7, Jeffrey Kegler wrote:
>
> One question: can you rely on a non-tag comment not containing a hash?
> That is, can you rely on there being nothing like
>
> PList file1.plist:plist3; # An extra hash # as if life was not already too
> difficult
>
>
> in the data? If so, you can treat a hash ('#') as something that ends a
> comment, in addition to newlines, and that will be a big step forward.
>
> -- jeffrey
>
> On 05/09/2014 01:08 PM, [email protected] <javascript:> wrote:
>
> Yeah, that line is definitely the problematic line. It's also the reason
> I'm rebuilding the parser from my current line by line methodology. Or
> attempting to :) I actually wrote this grammar up in Regexp::Grammars
> first, but the resource requirements were far too high. I figured I'd take
> the time to learn Marpa as the capabilities and performance seem more in
> line with what I needed.
>
> I believe event parsing the comments myself might be the way to go. I was
> also reading ranking documentation this morning, but I didn't get a good
> handle on it at all. Maybe I'll play with it and see what happens.
>
> Thanks for your time and insight here Jeffrey, I appreciate it :)
>
> On Friday, May 9, 2014 12:55:07 PM UTC-7, Jeffrey Kegler wrote:
>>
>> I just took a second look at this one
>>
>> GlobalPList plist4 { Pat n8000000g0000008; #KEEP# } }
>>
>> Ouch! The solution in the face of stuff like this may be to not treat
>> comments at the lexical level, but at the G1 level. That is, treat the
>> '#', ',', tags, etc. as lexemes and parse comments as if they were
>> statements. In your situation, that seems in effect to be the case. Your
>> comments seem to have more structure and variety than some of the
>> "statements". They are not just whitespace equivalents.
>>
>> At the G1 level you can use rule "rank" adverb (
>> https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless/DSL.pod#rank),
>> Marpa can help with the internal semantics of the comments. etc.
>>
>> I notice, by the way, that my documentation of the "rank" adverb could be
>> improved.
>>
>> -- jeffrey
>>
>> On 05/09/2014 12:09 PM, [email protected] wrote:
>>
>> You have the right idea. Unfortunately, I do not get to dictate the
>> syntax of this file I get to parse and there is considerable ambiguity in
>> comments. There are essentially three forms of a comment. Two forms of
>> this comment include information I need to parse. One form
>> (non-information comment) does not contain useful information.
>>
>> 1) embedded base number --> Matches OptEmbeddedBase --> Actual
>> information I need. Discernable from a non-information comment by it's
>> location immediately after the opening of a pattern list brace and that if
>> must contain '#base=<list>', where <list> is a comma delimited list of
>> integers.
>>
>> 2) tag string --> Matches TagStr --> Again, information I need.
>> Discernable from a non-information comment by location after a pattern
>> declaration and by the fact that it is bookended by '#' symbols can can
>> only contain a comma delimited list of word (\w) characters. Technically,
>> whitespace is not allowed inside these strings either. I figured I'd sort
>> that out once I had it matching as is.
>>
>> 3) Non information comment -> Matches COMMENT --> Can be discarded. This
>> is any comment that does not match one of the first two forms.
>>
>> Hopefully that's helpful. When you say that you'd 'simply say that in
>> the grammar', I'm confused. Is this not what I'm saying in the grammar in
>> the TagStr rule by setting '#' characters before and after the TagList
>> rule? Is there a better way to resolve this ambiguity?
>>
>> On Friday, May 9, 2014 11:46:16 AM UTC-7, Jeffrey Kegler wrote:
>>>
>>> Trying to get the idea, is it that tags use '#' as a delimiter, much in
>>> the same way that strings use quotes? And that's it's a comment if
>>> there's a '#' that is not matched before the newline? That is, that in
>>>
>>> Pat n2000000g0000002; #HOT# # Not so hot
>>>
>>> "#HOT#" is a tag, and "# Not so hot" is a comment?
>>>
>>> If that's the case, I'd simply say that in the grammar. I'd give more
>>> detail, but I'm not 100% clear on the intent at this point.
>>>
>>> -- jeffrey
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.