[julia-users] Re: Hashing speed question

Keith Campbell Tue, 04 Mar 2014 09:16:29 -0800

As a general matter, Python has probably been more heavily optimized for 
text IO and dictionary performance.

However in this case, the Counter in library DataStructures.jl  is your 
friend.  On my system it runs almost 2x as fast as the code in your gist, 
presumably making it roughly equivalent to the Python version:

using DataStructures
function wordcounter(filename)
    fid=open(filename);
    text = readall(fid)    
    close(fid)
    counts = counter(SubString{UTF8String})
     words=split(text, Set([' 
','\n','\r','\t','-','.',',',':','_','"',';','!']),false)
     for w in words
        add!(counts,w)
     end          
    return counts
end

@time ln = wordcounter("/tmp/juliaH2It5v");
elapsed time: 0.697941716 seconds (195758904 bytes allocated)

Whereas the gist version ran in 1.183864661 seconds.

On Tuesday, March 4, 2014 3:15:21 AM UTC-5, Roman Sinayev wrote:
>
> Why is Julia 2x slower than Python on this test?
>
> https://gist.github.com/lqdc/9342237
>

[julia-users] Re: Hashing speed question

Reply via email to