Is there a faster way to do the following, which builds a dictionary of
unique tokens and counts?
function unigrams(fn::String)
grams = Dict{String,Int32}()
f = open(fn)
for line in eachline(f)
for t in split(line)
i = get(grams,t,0)
grams[t] = i+1
end
end
close(f)
return grams
end
On a file with 1.9M unique tokens, this is 8x slower than Python written in
the same style. The big hit comes from string keys; using int keys is
closer to Python's performance. Timings: Julia 1083s, Python 126s, c++
80s.