In Sun, 7 Apr 2019 23:06:45 +0500 Nikita Zlobin <cook60020...@mail.ru> wrote:
> I really did not recognize that nasty trick, clearing xmm0 :). > Also i understood, why SSE can't be used there. Without integer > division support it is undoable with SSE - replacing with > multiplication means conversion to float. > I recently discovered fast integer division algorythm, allowing to accelerate multiple divisions with same divisor. I got working this way, but then discovered that gcc uses this method, so it is still doable by SSE. Though from other side, i still can't find enough places, where benefit of working with colors as single integers rather than separate color values would be meaningful... one such place is accumulator, used for averaging. While input is uint8_t[4], accumulator is uint16_t[4]. I have to either work with them by elements or use masks, bitshifts and OR for each element... just to prepare single value and store (either uing32_t[2] or just one uint64_t). Looks like benchmarks are necessary, along with these intrinsics, to test, wether integer SSE really better than what gcc proposes. _______________________________________________ Linux-audio-dev mailing list Linux-audio-dev@lists.linuxaudio.org https://lists.linuxaudio.org/listinfo/linux-audio-dev