When tokenizing large files, 
it is normal to end up with many many multiples of the same string.

Normal julia strings are not interned.
Which means if you accumulate a large list of tokens,
you end up duplicating a lot of strings, which uses unnesc memory.

When you are tokenizing documents that are multiple gigabytes long,
this really adds up.


`symbols` *are  *interned.
Is there any downsides to using them, when an interned string is required?

I tried testing them for it a while ago, and got Huge improvments in memory 
use, and thus also in speed (allocating memory is expensive).

There are not `convert` methods defined for switching between symbols and 
strings but
`string(::Symbol)` and `symbol(::AbstractString)` work.


Reply via email to