Hi, On Mon, Feb 11, 2013 at 11:58 PM, Chad Fraleigh <[email protected]> wrote: > Two questions.. > > Currently in CPUDevice/CPUDeviceTask where optimization can be used it > first calls the system_cpu_support_sse2() check, and then if that is > unavailable tries system_cpu_support_sse3(). I guess this is also a > two part in itself - 1) Is it false to assume SSE3 would be better > than SSE2? If SSE3 is better then shouldn't this check be first > followed by the less ideal SSE2 (followed by the even less ideal basic > impl)? 2) If a cpu supports SSE3, would it always also support SSE2, > and effectively never use the SSE3 implementation (as SSE2 always gets > used instead) as-is.
The order should indeed be switched, will commit fix for that. And yes any CPU supporting SSE3 will support SSE2 in practice. > The other thing is since CPUDeviceTask is already OO-based, rather > than doing checks each time an optimizable method is called to > determine what implementation to use, wouldn't it be cleaner to make > CPUDeviceTask semi-abstract and create three sub-classes (e.g. > BasicCPUDeviceTask, SSE2CPUDeviceTask, SSE3CPUDeviceTask) with each > custom impl and just have task_add() [or something] decide which to > create? Depending on how often these methods are called it may or may > not have much time saving (by not doing those checks each time), but > would seem more maintainable than having several related #ifdef's and > system_cpu_support_*()'s scattered about. It might also eventually > help allow other implementations to be dropped in without needing > large chucks of the core CPUDeviceTask modified (i.e. if plugable > support for devices is ever reached/to be reached). Subclassing indeed would be possible, I think it's just a matter of preference when you do that vs. just adding some if statements. At the moment I don't think it would help clarify much but if the code gets bigger it might be a good change. > Also, for CPU's that support (and thus require SSE), how hard would it > be to compile the non-optimize calls (and functions) out to reduce the > final executable size, as that code will never be called in these > cases? If the final 'else' part of the 'if/else if' was removed and > used an #else instead (on WITH_OPTIMIZED_KERNEL) for the non-optimized > parts. Ok.. this makes is 3.5 questions total! =) I guess it's possible but the plan is to add explicit SSE instructions eventually, and then it's nice for testing to be able to quickly try the non-SSE version even if the CPU does not need it. The plan is to take advantage of sse4/avx in the future too so we'll probably get a few more options, but this is just for the performance-critical kernel so I don't think binary size is that important. Brecht. _______________________________________________ Bf-committers mailing list [email protected] http://lists.blender.org/mailman/listinfo/bf-committers
