You could try using symbols instead of strings. Replace t with symbol(t).
On Nov 19, 2014 8:06 PM, "Greg Lee" <[email protected]> wrote:

> Is there a faster way to do the following, which builds a dictionary of
> unique tokens and counts?
>
> function unigrams(fn::String)
>     grams = Dict{String,Int32}()
>     f = open(fn)
>     for line in eachline(f)
>         for t in split(line)
>             i = get(grams,t,0)
>             grams[t] = i+1
>         end
>     end
>     close(f)
>     return grams
>
> end
>
>
> On a file with 1.9M unique tokens, this is 8x slower than Python written
> in the same style.  The big hit comes from string keys; using int keys is
> closer to Python's performance.  Timings:  Julia 1083s, Python 126s, c++
> 80s.
>

Reply via email to