On Monday, 14 May 2012 at 15:53:34 UTC, Tobias Pankrath wrote:
Quoting your post in another thread:
On Monday, 14 May 2012 at 15:10:25 UTC, Roman D. Boiko wrote:
Making it a class would give several benefits:
* allow not to worry about allocating a big array of tokens.
E.g., on 64-bit OS the largest module in Phobos (IIRC, the
std.datetime) consumes 13.5MB in an array of almost 500K
tokens. It would require 4 times smaller chunk of contiguous
memory if it was an array of class objects, because each would
consume only 8 bytes instead of 32.
You'll still have count the space the tokens claim on the heap.
So it's
basically the 500k tokens plus 500k references. I'm not sure,
why you would need such a big array of tokens, though.
Aren't they produced by the lexer to be directly consumed and
discarded by the parser?
I use sorted array of tokens for efficient 0(log N) lookup by its
index (the first code unit of token). (Since tokens are created
in increasing order of start indices, no further sorting is
needed.) Lookup is used for two purposes:
* find the token corresponding to location of cursor (e.g., for
auto-complete)
* combined with 0(log M) lookup in the ordered array of first
line code unit indices, calculate the Location (line number and
column number) for start / end of a token on demand (they are not
pre-calculated because not used frequently); this approach also
makes it easy to calculate Location either taking into account
special token sequences (#line 3 "ab/c.d"), or ignoring them.
* allow subclassing, for example, for storing strongly typed
literal values; this flexibility could also facilitate future
extensibility (but it's difficult to predict which kind of
extension may be needed)
If performance matters, why would you subclass and risk a
virtual method call for something as basic as tokens?
Agree, but not sure. That's why I created this thread.
* there would be no need to copy data from tokens into AST,
passing an object would be enough (again, copy 8 instead of 32
bytes); the same applies to passing into methods - no need to
pass by ref to minimise overhead
I'm using string to store source content in tokens. Because of
the way string in D works, there is no need for data copies.
The same do I. But size of string field is still 16 bytes (half
of my token size).
These considerations are mostly about performance. I think
there is also some impact on design, but couldn't find
anything significant (given that currently I see a token as
merely a datastructure without associated behavior).
IMO token are value types.
The value type might be implemented as struct or class.