[il-antlr-interest: 30683] Re: [antlr-interest] Identifiers with Spaces

Michael Bosch Mon, 29 Nov 2010 14:34:06 -0800

Hi William!

On Fri, 2010-11-26 at 21:42 -0700, William Clodius wrote:
> There are workarounds for your specific problem, but in general I would 
> suggest a complete revision of your approach.


Which other workarounds are there?  Can you give me some pointers?

Does this mean that there is no simple solution with ANTLR?

I played around with it some more and noticed that my lexer rules
are actually just regular expressions.  This is probably the usual
case for lexers.  So I just threw my problem at gnu sed and
it solves my tokenization problem perfectly:

command: sed 's/\(a\+\( \+a\+\)*\| \|=\)/[\1]/g'
input: a aa = aa
output: [a aa][ ][=][ ][aa]

Granted, the syntax is ugly and I would have to somehow put this into
code. But it gave me the idea of creating a simple preprocessor
that frames the identifiers with \u0002 and \u0003, such that
ANTLR recognizes them without problem.

> What you are trying to do is generally better addressed during the semantic 
> analysis, then during the lexical construction. I suggest the following 
> approach
> 
> id_sequence : ID ID*
> 
> where ID is whatever you allow in an identifier between spaces. Then during 
> the semantic analysis wherever you find an id_sequence in effect treat the 
> first ID as a function that takes the rest of the id_sequence as an argument 
> returning an "identifier". This analysis can be performed recursively fore 
> each ID in the sequence. The implementation is straightforward, but tedious, 
> and of course left to the student.

Actually the spaces are part of the identifier and are significant.
That means I would have to know how many identifiers were between the
two IDs of an id_sequence.  I saw somebody mention that you could
somehow access the hidden channel used to ignore spaces but I did
not find any good explanation of how to do that.

Michael



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 30683] Re: [antlr-interest] Identifiers with Spaces

Reply via email to