Better but not competitive: 547s with symbols.
On Wednesday, November 19, 2014 6:51:43 PM UTC-8, tshort wrote: > > You could try using symbols instead of strings. Replace t with symbol(t). > On Nov 19, 2014 8:06 PM, "Greg Lee" <[email protected] <javascript:>> > wrote: > >> Is there a faster way to do the following, which builds a dictionary of >> unique tokens and counts? >> >> function unigrams(fn::String) >> grams = Dict{String,Int32}() >> f = open(fn) >> for line in eachline(f) >> for t in split(line) >> i = get(grams,t,0) >> grams[t] = i+1 >> end >> end >> close(f) >> return grams >> >> end >> >> >> On a file with 1.9M unique tokens, this is 8x slower than Python written >> in the same style. The big hit comes from string keys; using int keys is >> closer to Python's performance. Timings: Julia 1083s, Python 126s, c++ >> 80s. >> >
