> But I came across the attribute "Attribute[GroupSize(64)]" about which > the brooks documentation says, its for newer ATI cards to allow multiple > kernels to share the same memory area.
this attribute signals the compiler to generate compute shader code instead of pixel shader code. compute shader code has access to additional GPU features. It's been a while but i think we needed it for the register indexing feature. But it also allows register sharing and a few other things. > Looking deeper into the source, I didn't find any explicit on-card > memory sharing using the keyword "shared" for uint4[]. Thats because the brook code does not use the register indexing feature, there is no syntax support for it. The brook code can also be compiled to pixel shader and it is reasonable that there is a performance increase because of some architectural properties. But the manual IL code that uses the register indexing feature and thus is forced to run in compute shader mode will not run on a HD3xxx which only supports pixel shader code. The manual IL is around 100% faster IIRC (compared to the compiled brook IL on the same hardware, because the register indexing feature allows you to store intermediate results in registers instead of DRAM). There is not good reason to distribute suboptimal code that runs on old hardware. > > For my it looked like the code should work without any sharing of > on-card memory between threads. But I don't have any experience > regarding GPU coding, so I tried. > > I simply removed the attribute, recompiled the .il and the Stream > Analyzer now reports that the models "FireStream" and "Radeon HD > 3/4/5xxx" will run the kernel. > (funny that the througput for HD5870 raises from 2041 MThreads/sec to > 2189 MThreads/sec, thats a 7% speedup) > what is MThreads/sec? _______________________________________________ A51 mailing list [email protected] http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
