Hi, I modified cksum to use the well known slice by 8 algorithm in the CRC calculation, to make it faster. On my machine it is several times faster than the unmodified cksum. It took me a while to figure out since the CRC calculation in cksum shifts in the opposite direction than most other implementations I've seen. I would be glad if someone could check this patch on a big endian machine to see if it produces the correct output! It think it might, but not sure.
You can see the patch here: https://github.com/coreutils/coreutils/pull/43 -- /Kristoffer Brånemyr