https://github.com/JuliaLang/julia/issues/8826

Just curious, what do you get if you replace

    for t in split(line)

with

    words = split(line)
    for i = 1:length(words)
        t = words[i]

?

-Mike

On Wed, Nov 19, 2014 at 10:47 PM, Greg Lee <[email protected]> wrote:
> Better but not competitive: 547s with symbols.
>
> On Wednesday, November 19, 2014 6:51:43 PM UTC-8, tshort wrote:
>>
>> You could try using symbols instead of strings. Replace t with symbol(t).
>>
>> On Nov 19, 2014 8:06 PM, "Greg Lee" <[email protected]> wrote:
>>>
>>> Is there a faster way to do the following, which builds a dictionary of
>>> unique tokens and counts?
>>>
>>> function unigrams(fn::String)
>>>     grams = Dict{String,Int32}()
>>>     f = open(fn)
>>>     for line in eachline(f)
>>>         for t in split(line)
>>>             i = get(grams,t,0)
>>>             grams[t] = i+1
>>>         end
>>>     end
>>>     close(f)
>>>     return grams
>>>
>>> end
>>>
>>>
>>> On a file with 1.9M unique tokens, this is 8x slower than Python written
>>> in the same style.  The big hit comes from string keys; using int keys is
>>> closer to Python's performance.  Timings:  Julia 1083s, Python 126s, c++
>>> 80s.

Reply via email to