Hi Jeff,

there's a good chance that your compiler outsmarted you. i.e. parts of your test are optimized out. I suggest to use smth like "benchmark" for tests. Also, make sure that the variables in your test cannot be optimized out.

Cheers
Johannes

On 08.10.23 00:22, Jeff R wrote:
I modified a simple Volk sqrt program for an ARM1176JZ-S processor to test performance, and the results are puzzling. The following program prints:


dur_VolkSqrt=(0.000000)0.001721 dur_CRTLSqrt=(0.000000)0.000318


The following processor information is displayed. It appears as though NEON is supported.


~/volk-3.0.0/build# cpu_features/list_cpu_features

arch            : aarch64____

implementer     :  65 (0x41)____

variant         :   0 (0x00)____

part            : 3336 (0xD08)____

revision        :   3 (0x03)

flags           : asimd,cpuid,crc32,fp


Why are the numbers so slow for Volk versus the CRTL? I may be missing something obvious. Thank you in advance.


Here’s the test program:



// g++ -I /usr/local/include/volk volk_sqrt.cpp -o volk_sqrt -L /usr/local/lib64/ -lvolk

// export LD_LIBRARY_PATH=/usr/local/lib64; ./volk_sqrt


#include <stdio.h>

#include <math.h>

#include <volk.h>

#include <limits.h>

#include <time.h>

#include <sys/time.h>


double get_wall_time()

{

     struct timeval time;


     if (gettimeofday(&time,NULL))

     {

         //  Handle error

         return 0;

     }

     return (double)time.tv_sec + (double)time.tv_usec * .000001;

}


int main(int argc, char* args[])

{

     double walStop;

     double walStart;

     double dur_VolkSqrt;

     double dur_CRTLSqrt;

     int N = 1024*16;


     unsigned int alignment = volk_get_alignment();

     float* in = (float*)volk_malloc(sizeof(float)*N, alignment);

     float* out = (float*)volk_malloc(sizeof(float)*N, alignment);


     for(unsigned int ii = 0; ii < N; ++ii)

     {

         in[ii] = (float)(ii*ii);

     }


     walStart = get_wall_time();

     volk_32f_sqrt_32f_a(out, in, N);

     //volk_32f_sqrt_32f(out, in, N);

     walStop = get_wall_time();

     dur_VolkSqrt = walStop - walStart;


     walStart = get_wall_time();

     for(unsigned int ii = 0; ii < N; ++ii)

     {

         out[ii] = sqrt(in[ii]);

     }

     walStop = get_wall_time();

     dur_CRTLSqrt = walStop - walStart;


    printf("dur_VolkSqrt=(%f)%f dur_CRTLSqrt=(%f)%f\n", dur_VolkSqrt/N, dur_VolkSqrt, dur_CRTLSqrt/N, dur_CRTLSqrt);

     volk_free(in);

     volk_free(out);

     return 0;

}


Reply via email to