Using Symbols seems to help, ie:

using DataStructures
function wordcounter_sym(filename)
    counts = counter(Symbol)
    words=split(readall(filename), Set([' 
','\n','\r','\t','-','.',',',':','_','"',';','!']),false)
     for w in words
        add!(counts,symbol(w))
     end          
    return counts
end

On my system this cut the time from 0.67 sec to 0.48 sec, about 30% less.
Memory use is also quite a bit lower.

On Tuesday, March 4, 2014 8:58:51 PM UTC-5, Roman Sinayev wrote:
>
> I updated the gist with times and code snippets
> https://gist.github.com/lqdc/9342237
>
> On Tuesday, March 4, 2014 5:15:29 PM UTC-8, Steven G. Johnson wrote:
>>
>> It's odd that the performance gain that you see is so much less than the 
>> gain on my machine.
>>
>> Try putting @time in front of "for w in words" and also in front of 
>> "words=...".   That will tell you how much time is being spent in each, and 
>> whether the limitation is really hashing performance.
>>
>> On Tuesday, March 4, 2014 7:55:12 PM UTC-5, Roman Sinayev wrote:
>>>
>>> I got to about 0.55 seconds with the above suggestions. Still about 2x 
>>> slower than Python unfortunately.
>>> The reason I find it necessary for hashing to work quickly is that I use 
>>> it heavily for both NLP and when serving data on a Julia webserver.
>>>
>>

Reply via email to