Is it the zlib implementation in the function crc32() you're comparing to? Taking a peek in the zlib source, it looks like they do a fair bit of manual loop unrolling and also process the CRC 4 bytes at a time. Given those differences, the speed difference might not be so surprising.
On Thu, Apr 10, 2014 at 10:00 AM, andrew cooke <[email protected]> wrote: > > The fastest routine at > https://github.com/andrewcooke/CRC.jl/blob/master/test/speed.jl is 2.6x > slower than C code. > > I've tried to isolate things so it's easy to hack and experiment with. If > anyone can beat my best code (which - credit to Julia - is also the > simplest; anything I try to make it faster just makes it slower) I'd love to > know... > > Cheers, > Andrew > > >
