Re: CRC32 performance

Bryan Duxbury Tue, 07 Oct 2008 14:17:47 -0700

I put together a small benchmark app just for the two CRC algorithmsalone (code available on request). I run the same amount of datathrough each one in exactly the same pattern. The results look likethis:


Adler32: 1983 ms
CRC32: 6514 ms
Ratio: 0.30442125

The ratio holds for different lengths of tests, too. This would seemto indicate that there's a fair bit of benefit to be extracted fromupgrading to Adler. From looking at the HDFS code, it even seems likeit's designed to work with different implementations of Checksum, soit doesn't seem like it would be hard to use this instead.

I might still take the time to build an isolated benchmark that'sactually the hadoop code, but I thought I'd share these intermediateresults.


-Bryan

On Oct 7, 2008, at 10:31 AM, Doug Cutting wrote:

Don't try this on anything but an experimental filesystem. If youcan simply find the places where HDFS calls the CRC algorithm andreplace them with zeros, then you should be able to get areasonable benchmark.
Doug

Bryan Duxbury wrote:
I'm willing to give this a shot. Let me just be sure I understandwhat I'd have to do: if I make it stop computing CRCs altogether,I need to make changes in the datanode as well, right? To stopchecking validity of CRCs? Will this break anything interestingand unexpected?
On Oct 6, 2008, at 4:58 PM, Doug Cutting wrote:
Bryan Duxbury wrote:
I am profiling with YourKit on random reducers. I'm also runningon HDFS, so I don't know how one would go about disabling CRCs.
Hack the CRC-computing code to fill things with zeros?

Doug

Re: CRC32 performance

Reply via email to