------- Comment #2 from zsojka at seznam dot cz 2009-07-16 15:42 ------- When data[j] = ((i + j) & 0xFF00) >> 8; is replaced by data[j] = (i + j) >> 8;
generated asm uses "shr eax, 8" instead of "movzx eax, ah", and runs in 19 ticks in average. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40772