Hi folks,

I measured how many  u += a % b operations my Celeron 366
is able to perform in one sec.

(an example of the % operation would be to use it in boundary clipping
code)

about 9million  ops/sec , disappointing, 
( it uses the idivl operation).

when I use &  (and) instead of % , the speed is boosted
to 150million ops/sec ! , almost factor 16 !

But to use &  instead of %  you need that  in the
a % b  operation  b  is a power of 2.

( in this case  a %b  is replaced with  a & bminus1
bminus1=b - 1   :-)   )

In the audio field we could use this optimization,
when the buffersize is a power of 2.

I just grepped through the ALSA driver code and there
are some   %   frag_size operations.

I think on almost all cards frag_size (the fragment size) is a power
of two, therfore we could replace the % with the &.

ALSA does a few % operations per every played fragment.

But even if & is 16 times faster than  % , using very small fragments
like 1msec fragments, and assuming ALSA does 5 % operations per fragment.

we would save at maximum:
1000 (frags/sec) * 5 (% ops per fragment) = 5000 ops per sec
every % operation takes  0.11 usecs  ( 1 / 9000000)

5000*0.11 = 550 usecs saved (assumed that & has zero cost) per second.

that is  about  550 usec / 1000000usecs =  an insignificant 0.55 % performance
gain. 

not worth the trouble IMHO
:-)


do you agree ?

PS: getting too paranoid sometimes , eh ?  :-)

Benno.

Reply via email to