Is there a faster way to do the following, which builds a dictionary of 
unique tokens and counts?

function unigrams(fn::String)
    grams = Dict{String,Int32}()
    f = open(fn)
    for line in eachline(f)
        for t in split(line)
            i = get(grams,t,0)
            grams[t] = i+1
        end
    end
    close(f)
    return grams 

end


On a file with 1.9M unique tokens, this is 8x slower than Python written in 
the same style.  The big hit comes from string keys; using int keys is 
closer to Python's performance.  Timings:  Julia 1083s, Python 126s, c++ 
80s. 

Reply via email to