Using Symbols seems to help, ie: using DataStructures function wordcounter_sym(filename) counts = counter(Symbol) words=split(readall(filename), Set([' ','\n','\r','\t','-','.',',',':','_','"',';','!']),false) for w in words add!(counts,symbol(w)) end return counts end
On my system this cut the time from 0.67 sec to 0.48 sec, about 30% less. Memory use is also quite a bit lower. On Tuesday, March 4, 2014 8:58:51 PM UTC-5, Roman Sinayev wrote: > > I updated the gist with times and code snippets > https://gist.github.com/lqdc/9342237 > > On Tuesday, March 4, 2014 5:15:29 PM UTC-8, Steven G. Johnson wrote: >> >> It's odd that the performance gain that you see is so much less than the >> gain on my machine. >> >> Try putting @time in front of "for w in words" and also in front of >> "words=...". That will tell you how much time is being spent in each, and >> whether the limitation is really hashing performance. >> >> On Tuesday, March 4, 2014 7:55:12 PM UTC-5, Roman Sinayev wrote: >>> >>> I got to about 0.55 seconds with the above suggestions. Still about 2x >>> slower than Python unfortunately. >>> The reason I find it necessary for hashing to work quickly is that I use >>> it heavily for both NLP and when serving data on a Julia webserver. >>> >>