I would be more concerned about style than speed -- symbols as strings is an ancient Lisp technique in NLP, but IMO a Dict of strings would be better style.
Also see http://juliastats.github.io/DataFrames.jl/stable/man/pooling/ . Best, Tamas On Fri, Apr 22 2016, Lyndon White wrote: > When tokenizing large files, > it is normal to end up with many many multiples of the same string. > > Normal julia strings are not interned. > Which means if you accumulate a large list of tokens, > you end up duplicating a lot of strings, which uses unnesc memory. > > When you are tokenizing documents that are multiple gigabytes long, > this really adds up. > > > `symbols` *are *interned. > Is there any downsides to using them, when an interned string is required? > > I tried testing them for it a while ago, and got Huge improvments in memory > use, and thus also in speed (allocating memory is expensive). > > There are not `convert` methods defined for switching between symbols and > strings but > `string(::Symbol)` and `symbol(::AbstractString)` work.
