block for variable local size

Samuel Pitoiset Wed, 05 Oct 2016 11:49:12 -0700

When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.


This allows to use 64 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <[email protected]>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
index 4a701f7..eaf50cc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h
@@ -175,6 +175,8 @@ public:
 
    virtual void parseDriverInfo(const struct nv50_ir_prog_info *info) {
       threads = info->prop.cp.numThreads;
+      if (threads == 0)
+         threads = info->target >= NVISA_GK104_CHIPSET ? 1024 : 512;
    }
 
    virtual bool runLegalizePass(Program *, CGStage stage) const = 0;
-- 
2.10.0

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 12/14] nv50/ir: set number of threads/block for variable local size

Reply via email to