On 2012-05-11 11:22, Roman D. Boiko wrote:

What about line and column information?
Indices of the first code unit of each line are stored inside lexer and
a function will compute Location (line number, column number, file
specification) for any index. This way size of Token instance is reduced
to the minimum. It is assumed that Location can be computed on demand,
and is not needed frequently. So column is calculated by reverse walk
till previous end of line, etc. Locations will possible to calculate
both taking into account special token sequences (e.g., #line 3
"ab/c.d"), or discarding them.

Aha, clever. As long as I can get out the information I'm happy :) How about adding properties for this in the token struct?

* Does it convert numerical literals and similar to their actual values
It is planned to add a post-processor for that as part of parser,
please see README.md for some more details.

Isn't that a job for the lexer?
That might be done in lexer for efficiency reasons (to avoid lexing
token value again). But separating this into a dedicated post-processing
phase leads to a much cleaner design (IMO), also suitable for uses when
such values are not needed.

That might be the case. But I don't think it belongs in the parser.

Also I don't think that performance would be
improved given the ratio of number of literals to total number of tokens
and the need to store additional information per token if it is done in
lexer. I will elaborate on that later.

Ok, fair enough. Perhaps this could be a property in the Token struct as well. In that case I would suggest renaming "value" to lexeme/spelling/representation, or something like that, and then name the new property "value".

--
/Jacob Carlborg

Reply via email to