The very first token gives you a =1 for the char position in line I am afraid, I need to work around that I think, but the indexes are pointers in to memory (your input) and not 0, 1, 2 etc. Note that the token also remembers that start of the line that it is located on.
If the start of the first token is not the start of your data, then perhaps there are comments and newline tokens that are skipped before the first token that the parser sees? If this did not work, there would be a lot of broken parsers out there. So, use the pointer to get the start, subtract it from the end pointer to get the length and print out that many characters, which will show you what the token matched. The line start is updated when a '\n' is seen by the parser, but you can change the character. This is useful for error messages when you want to print the text line that an error occurs in. The offset of the token is the start point minus the input start (use the address you pass in (databuffer) and not input->data), however, the pointer is pointing directly at that anyway. I think that you are forgetting that the token stream does not return off channel tokens or SKIP()ed tokens. Jim > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of A Z > Sent: Friday, November 19, 2010 4:44 AM > To: [email protected] > Subject: [antlr-interest] C target character position > > Hello, > > I'm trying to record the offset of the start of a token, relative to > the beginning of the input buffer. My program passes a (char *) buffer > to ANTLR and then runs a simple grammar that builds a data structure > containing the element types and pointer to their position in the text > buffer. The problem is I can't find a way to get the true character > offset from ANTLR in order to set the pointer. Below it prints out the > results of most of the values for the ANTLR3_COMMON_TOKEN for the very > first token. The two subsequent values are the data member and the > address of the character buffer. I would expect start, getStartIndex > and input->data to be the same but they are different. How can I find > the offset of a token, in terms of the number of characters from the > start of the stream? > > Thanks > > charPosition : -1 > getCharPositionInLine : -1 > getLine : 1 > getStartIndex : 23213648 > getStopIndex : 23213653 > getTokenIndex : 0 > index : 0 > line : 1 > lineStart : 23213648 > start : 23213648 > stop : 23213653 > > (pANTLR3_INPUT_STREAM)input->data 23217928 > (uint8_t*)dataBuffer 23213624 > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
