nor to contain the most optimized algorithms around.
This D1 code adapted from C code is much more efficient (and in D2 with ranges and TypeTuple-foreach it could become more efficient and much shorter), but I think something like this is overkill for Phobos:
http://dpaste.dzfl.pl/cf97d15ade27 Bye, bearophile
