Lunderberg commented on code in PR #17145:
URL: https://github.com/apache/tvm/pull/17145#discussion_r1679659483
##########
src/relax/transform/static_plan_block_memory.cc:
##########
@@ -286,8 +286,13 @@ class TokenAllocator1D {
std::vector<StorageToken> full_pool_;
};
-/*! \brief Check if the input op is "relax.reshape". */
-bool IsReshape(const Expr& op) { return op.same_as(Op::Get("relax.reshape")); }
+/*! \brief Check if the input op is a memory op that may return the same
buffer. */
Review Comment:
Thank you on the updated docstring. As I'm looking at it, we may want to
add this as another operator attribute (e.g.
`.set_attr<Bool>("ReturnMayAliasArgument", Bool(true))`), but that could be a
follow-up PR instead.
##########
include/tvm/runtime/device_api.h:
##########
@@ -240,6 +240,10 @@ class TVM_DLL DeviceAPI {
return device_type != kDLCPU && device_type != kDLMicroDev;
}
+ static bool SupportsPointerArithmetics(int device_type) {
Review Comment:
Since we already have a vtable for `DeviceAPI`, this should be a virtual
function instead of a static boolean. That would allow individual `DeviceAPI`
implementations to independently mark that they support the pointer-arithmetic.
(It would also allow checking for driver-dependent support, such as vulkan
support for the optional `VK_KHR_buffer_device_address` feature.)
Since host-side pointer arithmetic is not the default behavior for
`DLTensor::data`, the default implementation in `DeviceAPI` would return false,
and it could be overridden in `CPUDeviceAPI` and `CUDADeviceAPI` to return true.
##########
src/relax/op/memory/view.cc:
##########
@@ -58,9 +58,9 @@ StructInfo InferStructInfoView(const Call& call, const
BlockBuilder& ctx) {
if (auto opt = sinfo.as<TensorStructInfo>()) {
return opt.value();
} else {
- LOG(FATAL) << "TypeError: "
- << "Operator " << call->op << " expects first argument to be
a tensor, "
- << "but received " << arg_data << " with type " << sinfo;
+ LOG(FATAL) << "TypeError: " << "Operator " << call->op
Review Comment:
Nitpick: This looks like just a formatting change, but `clang-format` would
produce the old formatting.
##########
src/runtime/relax_vm/builtin.cc:
##########
@@ -545,6 +545,23 @@
TVM_REGISTER_GLOBAL("vm.builtin.tensor_to_shape").set_body_typed([](NDArray data
return ShapeTuple(out_shape);
});
+TVM_REGISTER_GLOBAL("vm.builtin.ensure_zero_offset").set_body_typed([](NDArray
data) {
+ if (data->byte_offset == 0) {
+ return data;
+ }
+ if (DeviceAPI::SupportsPointerArithmetics(data->device.device_type)) {
Review Comment:
Can we also restrict this to `data->byte_offset % kAllocAlignment == 0`?
The alignment guarantee from TVM's allocators ensure that `NDArray->data %
kAllocAlignment == 0`. This is checked when taking ownership of a
`DLManagedTensor*`, and would raise an error in `NDArray::FromDLPack` if it
fails.
##########
python/tvm/relax/transform/transform.py:
##########
@@ -586,14 +586,14 @@ def ComputePrimValue() -> tvm.ir.transform.Pass:
return _ffi_api.ComputePrimValue() # type: ignore
-def VMBuiltinLower() -> tvm.ir.transform.Pass:
+def LowerRuntimeBuiltin() -> tvm.ir.transform.Pass:
Review Comment:
I feel silly for this request after having pushed for the name change, but
can we leave a backwards-compatibility wrapper for `VMBuiltinLower`?
Otherwise, we would break existing external code (e.g.
[here](https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/compiler_pass/pipeline.py#L173)
in MLC-LLM). I'm thinking something like the following:
```python
import warnings
def VMBuiltinLower():
warnings.warn(
"tvm.relax.transform.VMBuiltinLower has been renamed to
'LowerRuntimeBuiltin'. "
"This wrapper is for backwards compatibility, and will be removed in
a later update."
)
return LowerRuntimeBuiltin()
```
That way, we have a period of time when either `VMBuiltinLower()` or
`LowerRuntimeBuiltin()` can be used interchangeably. During this period,
external code can update to the latest TVM without breakage, but will receive a
warning whenever they use `VMBuiltinLower()`. Removing `VMBuiltinLower()`
entirely after the grace period would then be much less painful.
##########
include/tvm/runtime/device_api.h:
##########
@@ -240,6 +240,10 @@ class TVM_DLL DeviceAPI {
return device_type != kDLCPU && device_type != kDLMicroDev;
}
+ static bool SupportsPointerArithmetics(int device_type) {
Review Comment:
Also, a nitpick: This isn't whether the device supports pointer arithmetic,
but whether pointer arithmetic of a device-owned `void* DLTensor::data` may be
performed on the host. The TVM backends for both Vulkan and OpenCL support
pointer arithmetic, but only within the generated kernels. Neither support
pointer arithmetic being performed on the host.
##########
tests/python/relax/test_op_view.py:
##########
@@ -483,18 +483,7 @@ def main(A: R.Tensor([4096], "float32")):
class Expected:
@R.function
def main(A: R.Tensor([4096], "float32")):
- B = R.ExternFunc(
- "runtime.TVMArrayCreateView",
- R.Callable(
- derive_func="tvm.relax.struct_info.infer_view_sinfo",
- purity=True,
- ),
- )(
- A,
- R.shape([64, 64]),
- R.dtype("float32"),
- R.prim_value(0),
- )
+ B = R.memory.view(A, shape=R.shape([64, 64]), dtype="float32",
relative_byte_offset=0)
return B
After = tvm.relax.transform.LegalizeOps()(Before)
Review Comment:
After replacing the `.set_attr<FLegalize>` for `R.memory.view` with
`.set_attr<FLowerBuiltin>`, the changes to these unit tests can be reverted.
Instead, any use of `LegalizeOps` in the unit tests would instead call
`LowerRuntimeBuiltin`.
##########
src/relax/op/memory/view.cc:
##########
@@ -334,13 +334,12 @@ Expr LegalizeView(const BlockBuilder& bb, const Call&
call) {
relative_byte_offset = relax::PrimValue::Int64(0);
}
- StructInfoDeriveFunc infer_sinfo_env_func;
- infer_sinfo_env_func =
EnvFunc::Get("tvm.relax.struct_info.infer_view_sinfo");
- auto runtime_view_sinfo = FuncStructInfo::OpaqueFunc(infer_sinfo_env_func,
true);
-
- ExternFunc runtime_view_func("runtime.TVMArrayCreateView",
runtime_view_sinfo);
+ if (shape.same_as(call->args[1]) && dtype.same_as(call->args[2]) &&
+ relative_byte_offset.same_as(call->args[3])) {
+ return call;
+ }
- return Call(runtime_view_func, {data, shape, dtype, relative_byte_offset});
+ return Call(call->op, {data, shape, dtype, relative_byte_offset});
Review Comment:
I don't think we want to do the inference of the shape/dtype prior to
lowering, because it could result in unexpected StructInfo inference later on.
Suppose we have `R.memory.view(arg, shape=[16])`. This returns a view into
the first 16 elements of `arg`, without changing the dtype. If an `IRModule`
pass updates the datatype of `arg`, then that new datatype should also
propagate to the view. However, legalizing it to `R.memory.view(arg,
shape=[16], dtype="float16")` would return a view into `arg`, interpreting the
first 32 bytes as if they were `"float16"`. Now, if an `IRModule` pass updates
the datatype of `arg`, the view would still be `"float16"`. To avoid this
issue, the unknown arguments shouldn't be filled in until the lowering is about
to occur.
What if we were to remove `.set_attr<FLegalize>` altogether, and only have
`.set_attr<FLowerBuiltin>`? That way, we preserve the `R.memory.view` as-is
until it is ready to be lowered. The `LegalizeOps` pass would then be a no-op
for `R.memory.view`, and only the `LowerRuntimeBuiltin` pass would change it at
all.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]