zxybazh commented on PR #14020:
URL: https://github.com/apache/tvm/pull/14020#issuecomment-1434257456
Hi, thanks for checking my PR this late! Very good questions!
Let me share some of the context here. We are trying to support a dynamic
shape operator on Cuda. This function is generated during a relax pass called
`VMShapeLower` which is part of the relax build. And it will generate a
primfunc as follows:
```
@T.prim_func
def shape_func(H: T.Buffer((T.int64(3),), "int64")):
T.func_attr({"global_symbol": "shape_func"})
H[T.int64(2)] = T.int64(4) * H[T.int64(0)] * H[T.int64(1)]
```
Apparently, it's supposed to be running on CPU, i.e, the host instead of the
device. However, since this pass doesn't have access to the target information,
when the function is generated it doesn't include the target in its attribute.
Therefore, we would like to add an attribute to automatically bind it to the
target host in `BindTarget` and avoid it being splited into device code.
For Q1, It does not fail because this pass is after `BindTarget`. Thanks for
the tip! <s>Will remove this change in verify memory pass.</s> Has reverted the
change.
For Q2, given the context, IMHO if we can access target information in the
pass and do target binding, it's possible to avoid this new attribute. I'm not
quite sure if it's expected to add target as argument for certain pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]