Re: struct vs class for a simple token in my d lexer

Roman D. Boiko Mon, 14 May 2012 09:48:45 -0700

On Monday, 14 May 2012 at 15:53:34 UTC, Tobias Pankrath wrote:

Quoting your post in another thread:
On Monday, 14 May 2012 at 15:10:25 UTC, Roman D. Boiko wrote:
Making it a class would give several benefits:
* allow not to worry about allocating a big array of tokens.E.g., on 64-bit OS the largest module in Phobos (IIRC, thestd.datetime) consumes 13.5MB in an array of almost 500Ktokens. It would require 4 times smaller chunk of contiguousmemory if it was an array of class objects, because each wouldconsume only 8 bytes instead of 32.
You'll still have count the space the tokens claim on the heap.So it'sbasically the 500k tokens plus 500k references. I'm not sure,why you would need such a big array of tokens, though.
Aren't they produced by the lexer to be directly consumed anddiscarded by the parser?

I use sorted array of tokens for efficient 0(log N) lookup by itsindex (the first code unit of token). (Since tokens are createdin increasing order of start indices, no further sorting isneeded.) Lookup is used for two purposes:* find the token corresponding to location of cursor (e.g., forauto-complete)* combined with 0(log M) lookup in the ordered array of firstline code unit indices, calculate the Location (line number andcolumn number) for start / end of a token on demand (they are notpre-calculated because not used frequently); this approach alsomakes it easy to calculate Location either taking into accountspecial token sequences (#line 3 "ab/c.d"), or ignoring them.

* allow subclassing, for example, for storing strongly typedliteral values; this flexibility could also facilitate futureextensibility (but it's difficult to predict which kind ofextension may be needed)
If performance matters, why would you subclass and risk avirtual method call for something as basic as tokens?

Agree, but not sure. That's why I created this thread.

* there would be no need to copy data from tokens into AST,passing an object would be enough (again, copy 8 instead of 32bytes); the same applies to passing into methods - no need topass by ref to minimise overhead
I'm using string to store source content in tokens. Because ofthe way string in D works, there is no need for data copies.

The same do I. But size of string field is still 16 bytes (halfof my token size).

These considerations are mostly about performance. I thinkthere is also some impact on design, but couldn't findanything significant (given that currently I see a token asmerely a datastructure without associated behavior).
IMO token are value types.

The value type might be implemented as struct or class.

Re: struct vs class for a simple token in my d lexer

Reply via email to