roastduck commented on a change in pull request #5498:
URL: https://github.com/apache/incubator-tvm/pull/5498#discussion_r419055679
##########
File path: src/tir/transforms/lower_warp_memory.cc
##########
@@ -265,10 +265,11 @@ class WarpAccessRewriter : protected StmtExprMutator {
<< op->index << " local_index=" << local_index;
PrimExpr load_value = LoadNode::make(
op->dtype, op->buffer_var, local_index, op->predicate);
+ PrimExpr mask = IntImm(DataType::UInt(32), 0xFFFFFFFF);
Review comment:
Setting mask to `0xFFFFFFFF` here might not work, because this call site
is probably inside a branch, for example the boundary check `if (threadIdx.x <
n)`. It will be great if we can generate an accurate mask here. Otherwise we
should probably keep the old `__shlf` call for compatibility, maybe by adding
an "unknown" option to mask. To make it more explicit, downgrading
`__shfl_async` to `__shlf` is straight-forward, but upgrading `__shfl` to
`__shfl_async` is not.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]