> But I came across the attribute "Attribute[GroupSize(64)]" about which
> the brooks documentation says, its for newer ATI cards to allow multiple
> kernels to share the same memory area.

this attribute signals the compiler to generate compute shader code
instead of pixel shader code. compute shader code has access to
additional GPU features. It's been a while but i think we needed it
for the register indexing feature. But it also allows register sharing
and a few other things.

> Looking deeper into the source, I didn't find any explicit on-card
> memory sharing using the keyword "shared" for uint4[].

Thats because the brook code does not use the register indexing feature,
there is no syntax support for it. The brook code can also be compiled
to pixel shader and it is reasonable that there is a performance increase
because of some architectural properties. But the manual IL code that
uses the register indexing feature and thus is forced to run in compute
shader mode will not run on a HD3xxx which only supports pixel shader code.
The manual IL is around 100% faster IIRC (compared to the compiled brook
IL on the same hardware, because the register indexing feature allows you
to store intermediate results in registers instead of DRAM).

There is not good reason to distribute suboptimal code that runs on old
hardware.

> 
> For my it looked like the code should work without any sharing of
> on-card memory between threads. But I don't have any experience
> regarding GPU coding, so I tried.
> 
> I simply removed the attribute, recompiled the .il and the Stream
> Analyzer now reports that the models "FireStream" and "Radeon HD
> 3/4/5xxx" will run the kernel.
> (funny that the througput for HD5870 raises from 2041 MThreads/sec to
> 2189 MThreads/sec, thats a 7% speedup)
> 

what is MThreads/sec?
_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Reply via email to