Hey, You are completely right, that's the point. The matrix is of the size 1000x1000 and it is faster than the generic implementation above 500x500 (just a rough estimate). Most use-cases in gnuradio do not exploit this case.
But, if you want to promote VOLK outside the gnuradio context, this feature is quite unique. As far as I know, the SIMD support of OpenCL is pretty bad (I talk of the CPU frontend) and VOLK could combine a proper SIMD use with GPU acceleration. Nevertheless, I think there are some efficient encoder/decoder algorithms for GPUs, which could make use of such an integration. Greetings Stefan On 12/17/2015 07:14 PM, Sylvain Munaut wrote: > Hi, > >> RUN_VOLK_TESTS: volk_32f_x2_matrix_nxn_multiply_puppet_32f(1000000,10) >> generic completed in 28482ms >> a_opencl completed in 13364.3ms > > Question is how does that number change for smaller problem sizes ? > And what would be the average problem size encountered in real env. > > For SIMD optimization the result of "who's the fastest" doesn't vary > too much depending on problem size because they don't have much setup > / teardown size. > For OpenCL I very much doubt that would be the case and if you end up > with an app making a lot of "smallish" (and given the default buffer > size of GR, I feel the calls to volk aren't processing millions of > samples at a time in a single call) > > > Cheers, > > Sylvain > _______________________________________________ Discuss-gnuradio mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
