lazycal opened a new pull request, #11803:
URL: https://github.com/apache/tvm/pull/11803
When scheduling 2D softmax, the current cuda schedule assumes the reduction
axis to be the last axis, and yields incorrect schedule and raise error
messages that are hard to debug. For example, running the follow snippet:
```python
import tvm
from tvm import relay
shape = (64, 2)
dtype = 'float32'
A = relay.var('A', shape=shape, dtype=dtype)
B = relay.nn.softmax(A, axis=0)
f = relay.Function([A], B)
mod = tvm.IRModule.from_expr(f)
dev = tvm.cuda()
target = tvm.target.Target('cuda')
with tvm.transform.PassContext(opt_level=0):
executor = relay.build_module.create_executor(
'graph', mod, dev, target).evaluate()
```
I got
```Check failed: (!UsesVar(local_index, [this](const VarNode* var) { return
var == warp_index_.get(); })) is false: LowerWarpMemory failed to rewrite load
to shuffle for index ((threadIdx.x*5) + (k.inner*2))
local_index=(((threadIdx.x*5) + (k.inner*2))/32)```
with `opt_level=0` and
```Check failed: (match) is false: iter_var(blockIdx.x, , blockIdx.x) domain
already inferred, cannot prove their extents are the same 64 vs 2```
with `opt_level=4`.
This PR fixes the schedule to also support axis=0 for all the cuda 2D
schedules and enhances the unit testing to test all reduction axes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]