On Thu, Dec 17, 2015 at 1:14 PM, Sylvain Munaut <[email protected]> wrote:
> Hi, > > > RUN_VOLK_TESTS: volk_32f_x2_matrix_nxn_multiply_puppet_32f(1000000,10) > > generic completed in 28482ms > > a_opencl completed in 13364.3ms > > Question is how does that number change for smaller problem sizes ? > And what would be the average problem size encountered in real env. > > For SIMD optimization the result of "who's the fastest" doesn't vary > too much depending on problem size because they don't have much setup > / teardown size. > For OpenCL I very much doubt that would be the case and if you end up > with an app making a lot of "smallish" (and given the default buffer > size of GR, I feel the calls to volk aren't processing millions of > samples at a time in a single call) > > > Cheers, > > Sylvain > Stefan, This is a great start. But Sylvain makes good points about the data transfer issue. That's definitely a problem we have to think about. It's why we have avoided pursuing GPU support in VOLK in the past. Now, if heterogeneous processor technologies change, so might this problem. On the other hand, Doug Geiger has made progress on building OpenCL support into the buffer structure of the scheduler. What you've done here might work better as a block designed around this concept. Tom
_______________________________________________ Discuss-gnuradio mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
