Re: [PR] [Relax] Implement R.ensure_aligned and update memory planning for R.view [tvm]

via GitHub Tue, 16 Jul 2024 09:00:26 -0700


Lunderberg commented on code in PR #17145:
URL: https://github.com/apache/tvm/pull/17145#discussion_r1679659483



##########
src/relax/transform/static_plan_block_memory.cc:
##########
@@ -286,8 +286,13 @@ class TokenAllocator1D {
   std::vector<StorageToken> full_pool_;
 };
 
-/*! \brief Check if the input op is "relax.reshape". */
-bool IsReshape(const Expr& op) { return op.same_as(Op::Get("relax.reshape")); }
+/*! \brief Check if the input op is a memory op that may return the same 
buffer. */

Review Comment:
   Thank you on the updated docstring.  As I'm looking at it, we may want to 
add this as another operator attribute (e.g. 
`.set_attr<Bool>("ReturnMayAliasArgument", Bool(true))`), but that could be a 
follow-up PR instead.



##########
include/tvm/runtime/device_api.h:
##########
@@ -240,6 +240,10 @@ class TVM_DLL DeviceAPI {
     return device_type != kDLCPU && device_type != kDLMicroDev;
   }
 
+  static bool SupportsPointerArithmetics(int device_type) {

Review Comment:
   Since we already have a vtable for `DeviceAPI`, this should be a virtual 
function instead of a static boolean.  That would allow individual `DeviceAPI` 
implementations to independently mark that they support the pointer-arithmetic. 
 (It would also allow checking for driver-dependent support, such as vulkan 
support for the optional `VK_KHR_buffer_device_address` feature.)
   
   Since host-side pointer arithmetic is not the default behavior for 
`DLTensor::data`, the default implementation in `DeviceAPI` would return false, 
and it could be overridden in `CPUDeviceAPI` and `CUDADeviceAPI` to return true.



##########
src/relax/op/memory/view.cc:
##########
@@ -58,9 +58,9 @@ StructInfo InferStructInfoView(const Call& call, const 
BlockBuilder& ctx) {
     if (auto opt = sinfo.as<TensorStructInfo>()) {
       return opt.value();
     } else {
-      LOG(FATAL) << "TypeError: "
-                 << "Operator " << call->op << " expects first argument to be 
a tensor, "
-                 << "but received " << arg_data << " with type " << sinfo;
+      LOG(FATAL) << "TypeError: " << "Operator " << call->op

Review Comment:
   Nitpick: This looks like just a formatting change, but `clang-format` would 
produce the old formatting.



##########
src/runtime/relax_vm/builtin.cc:
##########
@@ -545,6 +545,23 @@ 
TVM_REGISTER_GLOBAL("vm.builtin.tensor_to_shape").set_body_typed([](NDArray data
   return ShapeTuple(out_shape);
 });
 
+TVM_REGISTER_GLOBAL("vm.builtin.ensure_zero_offset").set_body_typed([](NDArray 
data) {
+  if (data->byte_offset == 0) {
+    return data;
+  }
+  if (DeviceAPI::SupportsPointerArithmetics(data->device.device_type)) {

Review Comment:
   Can we also restrict this to `data->byte_offset % kAllocAlignment == 0`?  
The alignment guarantee from TVM's allocators ensure that `NDArray->data % 
kAllocAlignment == 0`.  This is checked when taking ownership of a 
`DLManagedTensor*`, and would raise an error in `NDArray::FromDLPack` if it 
fails.



##########
python/tvm/relax/transform/transform.py:
##########
@@ -586,14 +586,14 @@ def ComputePrimValue() -> tvm.ir.transform.Pass:
     return _ffi_api.ComputePrimValue()  # type: ignore
 
 
-def VMBuiltinLower() -> tvm.ir.transform.Pass:
+def LowerRuntimeBuiltin() -> tvm.ir.transform.Pass:

Review Comment:
   I feel silly for this request after having pushed for the name change, but 
can we leave a backwards-compatibility wrapper for `VMBuiltinLower`?  
Otherwise, we would break existing external code (e.g. 
[here](https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/compiler_pass/pipeline.py#L173)
 in MLC-LLM).  I'm thinking something like the following:
   
   ```python
   import warnings
   def VMBuiltinLower():
       warnings.warn(
           "tvm.relax.transform.VMBuiltinLower has been renamed to 
'LowerRuntimeBuiltin'.  "
           "This wrapper is for backwards compatibility, and will be removed in 
a later update."
       )
       return LowerRuntimeBuiltin()
   ```
   
   That way, we have a period of time when either  `VMBuiltinLower()` or 
`LowerRuntimeBuiltin()` can be used interchangeably.  During this period, 
external code can update to the latest TVM without breakage, but will receive a 
warning whenever they use `VMBuiltinLower()`.  Removing `VMBuiltinLower()` 
entirely after the grace period would then be much less painful.



##########
include/tvm/runtime/device_api.h:
##########
@@ -240,6 +240,10 @@ class TVM_DLL DeviceAPI {
     return device_type != kDLCPU && device_type != kDLMicroDev;
   }
 
+  static bool SupportsPointerArithmetics(int device_type) {

Review Comment:
   Also, a nitpick: This isn't whether the device supports pointer arithmetic, 
but whether pointer arithmetic of a device-owned `void* DLTensor::data` may be 
performed on the host.  The TVM backends for both Vulkan and OpenCL support 
pointer arithmetic, but only within the generated kernels.  Neither support 
pointer arithmetic being performed on the host.



##########
tests/python/relax/test_op_view.py:
##########
@@ -483,18 +483,7 @@ def main(A: R.Tensor([4096], "float32")):
     class Expected:
         @R.function
         def main(A: R.Tensor([4096], "float32")):
-            B = R.ExternFunc(
-                "runtime.TVMArrayCreateView",
-                R.Callable(
-                    derive_func="tvm.relax.struct_info.infer_view_sinfo",
-                    purity=True,
-                ),
-            )(
-                A,
-                R.shape([64, 64]),
-                R.dtype("float32"),
-                R.prim_value(0),
-            )
+            B = R.memory.view(A, shape=R.shape([64, 64]), dtype="float32", 
relative_byte_offset=0)
             return B
 
     After = tvm.relax.transform.LegalizeOps()(Before)

Review Comment:
   After replacing the `.set_attr<FLegalize>` for `R.memory.view` with 
`.set_attr<FLowerBuiltin>`, the changes to these unit tests can be reverted.  
Instead, any use of `LegalizeOps` in the unit tests would instead call 
`LowerRuntimeBuiltin`.



##########
src/relax/op/memory/view.cc:
##########
@@ -334,13 +334,12 @@ Expr LegalizeView(const BlockBuilder& bb, const Call& 
call) {
     relative_byte_offset = relax::PrimValue::Int64(0);
   }
 
-  StructInfoDeriveFunc infer_sinfo_env_func;
-  infer_sinfo_env_func = 
EnvFunc::Get("tvm.relax.struct_info.infer_view_sinfo");
-  auto runtime_view_sinfo = FuncStructInfo::OpaqueFunc(infer_sinfo_env_func, 
true);
-
-  ExternFunc runtime_view_func("runtime.TVMArrayCreateView", 
runtime_view_sinfo);
+  if (shape.same_as(call->args[1]) && dtype.same_as(call->args[2]) &&
+      relative_byte_offset.same_as(call->args[3])) {
+    return call;
+  }
 
-  return Call(runtime_view_func, {data, shape, dtype, relative_byte_offset});
+  return Call(call->op, {data, shape, dtype, relative_byte_offset});

Review Comment:
   I don't think we want to do the inference of the shape/dtype prior to 
lowering, because it could result in unexpected StructInfo inference later on.
   
   Suppose we have `R.memory.view(arg, shape=[16])`.  This returns a view into 
the first 16 elements of `arg`, without changing the dtype.  If an `IRModule` 
pass updates the datatype of `arg`, then that new datatype should also 
propagate to the view.  However, legalizing it to `R.memory.view(arg, 
shape=[16], dtype="float16")` would return a view into `arg`, interpreting the 
first 32 bytes as if they were `"float16"`.  Now, if an `IRModule` pass updates 
the datatype of `arg`, the view would still be `"float16"`.  To avoid this 
issue, the unknown arguments shouldn't be filled in until the lowering is about 
to occur.
   
   What if we were to remove `.set_attr<FLegalize>` altogether, and only have 
`.set_attr<FLowerBuiltin>`?  That way, we preserve the `R.memory.view` as-is 
until it is ready to be lowered.  The `LegalizeOps` pass would then be a no-op 
for `R.memory.view`, and only the `LowerRuntimeBuiltin` pass would change it at 
all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Relax] Implement R.ensure_aligned and update memory planning for R.view [tvm]

Reply via email to