Better but not competitive: 547s with symbols.

On Wednesday, November 19, 2014 6:51:43 PM UTC-8, tshort wrote:
>
> You could try using symbols instead of strings. Replace t with symbol(t). 
> On Nov 19, 2014 8:06 PM, "Greg Lee" <[email protected] <javascript:>> 
> wrote:
>
>> Is there a faster way to do the following, which builds a dictionary of 
>> unique tokens and counts?
>>
>> function unigrams(fn::String)
>>     grams = Dict{String,Int32}()
>>     f = open(fn)
>>     for line in eachline(f)
>>         for t in split(line)
>>             i = get(grams,t,0)
>>             grams[t] = i+1
>>         end
>>     end
>>     close(f)
>>     return grams 
>>
>> end
>>
>>
>> On a file with 1.9M unique tokens, this is 8x slower than Python written 
>> in the same style.  The big hit comes from string keys; using int keys is 
>> closer to Python's performance.  Timings:  Julia 1083s, Python 126s, c++ 
>> 80s. 
>>
>

Reply via email to