Nice. Mine actually takes 30% more memory now (from how I understand Steven's comment mostly because we're making a copy of the Symbol) , but time is ~5% faster. Still about 0.55 though. Did you run the function several times in the REPL? I am getting these numbers when running the script from a command line.
Also wouldn't the result be a dictionary of symbols that I would then have to convert to strings again for further analysis? On Tuesday, March 4, 2014 7:52:19 PM UTC-8, Keith Campbell wrote: > > Using Symbols seems to help, ie: > > using DataStructures > function wordcounter_sym(filename) > counts = counter(Symbol) > words=split(readall(filename), Set([' > ','\n','\r','\t','-','.',',',':','_','"',';','!']),false) > for w in words > add!(counts,symbol(w)) > end > return counts > end > > On my system this cut the time from 0.67 sec to 0.48 sec, about 30% less. > Memory use is also quite a bit lower. > > On Tuesday, March 4, 2014 8:58:51 PM UTC-5, Roman Sinayev wrote: >> >> I updated the gist with times and code snippets >> https://gist.github.com/lqdc/9342237 >> >> On Tuesday, March 4, 2014 5:15:29 PM UTC-8, Steven G. Johnson wrote: >>> >>> It's odd that the performance gain that you see is so much less than the >>> gain on my machine. >>> >>> Try putting @time in front of "for w in words" and also in front of >>> "words=...". That will tell you how much time is being spent in each, and >>> whether the limitation is really hashing performance. >>> >>> On Tuesday, March 4, 2014 7:55:12 PM UTC-5, Roman Sinayev wrote: >>>> >>>> I got to about 0.55 seconds with the above suggestions. Still about 2x >>>> slower than Python unfortunately. >>>> The reason I find it necessary for hashing to work quickly is that I >>>> use it heavily for both NLP and when serving data on a Julia webserver. >>>> >>>