On 12/18/2015 12:30 AM, Tom Rondeau wrote: > On Thu, Dec 17, 2015 at 1:14 PM, Sylvain Munaut <[email protected]> wrote: > >> Hi, >> >>> RUN_VOLK_TESTS: volk_32f_x2_matrix_nxn_multiply_puppet_32f(1000000,10) >>> generic completed in 28482ms >>> a_opencl completed in 13364.3ms >> >> Question is how does that number change for smaller problem sizes ? >> And what would be the average problem size encountered in real env. >> >> For SIMD optimization the result of "who's the fastest" doesn't vary >> too much depending on problem size because they don't have much setup >> / teardown size. >> For OpenCL I very much doubt that would be the case and if you end up >> with an app making a lot of "smallish" (and given the default buffer >> size of GR, I feel the calls to volk aren't processing millions of >> samples at a time in a single call) >> >> >> Cheers, >> >> Sylvain >> > > > Stefan, > > This is a great start. But Sylvain makes good points about the data > transfer issue. That's definitely a problem we have to think about. It's > why we have avoided pursuing GPU support in VOLK in the past. Now, if > heterogeneous processor technologies change, so might this problem. > > On the other hand, Doug Geiger has made progress on building OpenCL support > into the buffer structure of the scheduler. What you've done here might > work better as a block designed around this concept. > > Tom >
Hi, I just wondered why it has not been done yet, but I see the problems now (Sylvain made the point). If a proper device selection and initialization is integrated into VOLK, probably the same processings could be used for the scheduler (e.g., with a generic fallback). But as well, I think that I don't know enough about all of this ;) Greetings Stefan _______________________________________________ Discuss-gnuradio mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
