Nice. Mine actually takes 30% more memory now (from how I understand 
Steven's comment mostly because we're making a copy of the Symbol) , but 
time is ~5% faster. Still about 0.55 though.
Did you run the function several times in the REPL? I am getting these 
numbers when running the script from a command line.

Also wouldn't the result be a dictionary of symbols that I would then have 
to convert to strings again for further analysis?

On Tuesday, March 4, 2014 7:52:19 PM UTC-8, Keith Campbell wrote:
>
> Using Symbols seems to help, ie:
>
> using DataStructures
> function wordcounter_sym(filename)
>     counts = counter(Symbol)
>     words=split(readall(filename), Set([' 
> ','\n','\r','\t','-','.',',',':','_','"',';','!']),false)
>      for w in words
>         add!(counts,symbol(w))
>      end          
>     return counts
> end
>
> On my system this cut the time from 0.67 sec to 0.48 sec, about 30% less.
> Memory use is also quite a bit lower.
>
> On Tuesday, March 4, 2014 8:58:51 PM UTC-5, Roman Sinayev wrote:
>>
>> I updated the gist with times and code snippets
>> https://gist.github.com/lqdc/9342237
>>
>> On Tuesday, March 4, 2014 5:15:29 PM UTC-8, Steven G. Johnson wrote:
>>>
>>> It's odd that the performance gain that you see is so much less than the 
>>> gain on my machine.
>>>
>>> Try putting @time in front of "for w in words" and also in front of 
>>> "words=...".   That will tell you how much time is being spent in each, and 
>>> whether the limitation is really hashing performance.
>>>
>>> On Tuesday, March 4, 2014 7:55:12 PM UTC-5, Roman Sinayev wrote:
>>>>
>>>> I got to about 0.55 seconds with the above suggestions. Still about 2x 
>>>> slower than Python unfortunately.
>>>> The reason I find it necessary for hashing to work quickly is that I 
>>>> use it heavily for both NLP and when serving data on a Julia webserver.
>>>>
>>>

Reply via email to