Hello, On Fri, Jun 5, 2015 at 2:22 PM, Francisco Jerez <curroje...@riseup.net> wrote: > Giuseppe Bilotta <giuseppe.bilo...@gmail.com> writes: >> >> Ok, scratch that. I was confused by the fact that Beignet reports a >> preferred work-group size multiple of 16. Intel IGPs support _logical_ >> SIMD width of up to 32, but the _hardware_ SIMD width is just 4. So >> the question is if here we should report the _hardware_ width, or the >> maximum _logical_ width. >> > The physical SIMD width of any Intel GPU that as far as I'm aware ILO > supports is 8, however, the hardware can execute 16- and in some cases > 32-wide instructions by splitting them internally into instructions of > the native SIMD width.
Well, according to the Gen7.5 and 8 manuals I found on Intel's site, it's actually 4, although with 2 FPUs. If the FPUs can execute different (and independent) instructions, then the “lower SIMD limit” would be 4, not 8, although in practice each execution unit has 8 PEs available. [snip] > As this cap is just a performance hint, I think it makes sense to assume > the best-case scenario as Grigori has done. If the driver later on > decides it doesn't pay off to use the maximum SIMD width it can always > use less, but using more may be difficult if the application didn't keep > it in mind while choosing the workgroup layout. OTOH, at least in OpenCL, this cap wouldn't be used 'raw' as performance hint, since the actual value returned (the PREFERRED_WORK_GROUP_SIZE_MULTIPLE) is a kernel property rather than a device property, so it may be tuned at kernel compilation time, according to effective work-item SIMD usage. In this sense I think the cap itself should be a 'lower limit', i.e. the value under which the kernel simply cannot fully utilize the hardware. IOW, I believe that if a larger group size than the physical SIMD width is needed for a specific kernel to fully utilize the hardware, this should be handled higher up in the stack, not at the level of this cap, since the value here is is going to be manipulated _anyway_ (e.g. a kernel written for float16 might even end up recommending a work-group size multiple of 1, as an extreme example). -- Giuseppe "Oblomov" Bilotta _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev