Hi, Did you test on any 1, 2, 4, 8 cpu machines? just to see if there are any performance degredations on lower count CPUs?
Also, yeah, the MOD operator in each loop could get spendy on older CPUs (eg my MIPS CPUs, older ARM stuff, etc.) Is it possible to achieve much the same autotuning with pow2 operations instead of divide/mod? -a _______________________________________________ firstname.lastname@example.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"