[il-antlr-interest: 30584] Re: [antlr-interest] C target character position

Jim Idle Fri, 19 Nov 2010 10:00:12 -0800

The very first token gives you a =1 for the char position in line I am
afraid, I need to work around that I think, but the indexes are pointers in
to memory (your input) and not 0, 1, 2 etc. Note that the token also
remembers that start of the line that it is located on.


If the start of the first token is not the start of your data, then perhaps
there are comments and newline tokens that are skipped before the first
token that the parser sees? If this did not work, there would be a lot of
broken parsers out there.

So, use the pointer to get the start, subtract it from the end pointer to
get the length and print out that many characters, which will show you what
the token matched. The line start is updated when a '\n' is seen by the
parser, but you can change the character. This is useful for error messages
when you want to print the text line that an error occurs in.

The offset of the token is the start point minus the input start (use the
address you pass in (databuffer) and not input->data), however, the pointer
is pointing directly at that anyway. I think that you are forgetting that
the token stream does not return off channel tokens or SKIP()ed tokens.

Jim



> -----Original Message-----
> From: [email protected] [mailto:antlr-interest-
> [email protected]] On Behalf Of A Z
> Sent: Friday, November 19, 2010 4:44 AM
> To: [email protected]
> Subject: [antlr-interest] C target character position
> 
> Hello,
> 
>   I'm trying to record the offset of the start of a token, relative to
> the beginning of the input buffer. My program passes a (char *) buffer
> to ANTLR and then runs a simple grammar that builds a data structure
> containing the element types and pointer to their position in the text
> buffer. The problem is I can't find a way to get the true character
> offset from ANTLR in order to set the pointer. Below it prints out the
> results of most of the values for the ANTLR3_COMMON_TOKEN for the very
> first token. The two subsequent values are the data member and the
> address of the character buffer. I would expect start, getStartIndex
> and input->data to be the same but they are different. How can I find
> the offset of a token, in terms of the number of characters from the
> start of the stream?
> 
> Thanks
> 
> charPosition          : -1
> getCharPositionInLine : -1
> getLine               : 1
> getStartIndex         : 23213648
> getStopIndex          : 23213653
> getTokenIndex         : 0
> index                 : 0
> line                  : 1
> lineStart             : 23213648
> start                 : 23213648
> stop                  : 23213653
> 
> (pANTLR3_INPUT_STREAM)input->data 23217928
> (uint8_t*)dataBuffer              23213624
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 30584] Re: [antlr-interest] C target character position

Reply via email to