"Jonathan M Davis" , dans le message (digitalmars.D:173860), a écrit : > struct Token > { > TokenType type; > string str; > LiteralValue value; > SourcePos pos; > } > > struct SourcePos > { > size_t line; > size_t col; > size_t tabWidth = 8; > }
The occurence of tabWidth surprises me. What is col supposed to be ? an index (code unit), a character number (code point), an estimation of where the caracter is supposed to be printed on the line, given the provided tabwidth ? I don't think the lexer can realy try to calculate at what column the character is printed, since it depends on the editor (if you want to use the lexer to syntax highlight for example), how it supports combining characters, zero or multiple column characters, etc. (which you may not want to have to decode). You may want to provide the number of tabs met so far. Note that there are other whitespace that you may want to count, but you shouldn't have a very complicated SourcePos structure. It might be easier to have whitespace, endofline and endoffile tokens, and let the user filter out or take into account what he wants to take into account. Or just let the user look into the original string...