happyme531 opened a new issue, #16530:
URL: https://github.com/apache/tvm/issues/16530

   - If a Graph executor model have too many node, `tvmc run` with `--profile` 
option will hang, and if you trace its OpenCL call, it shows 
CL_OUT_OF_HOST_MEMORY error on the `clFlush` call after a lot of 
`clEnqueueNDRangeKernel` / `clSetKernelArg` calls. But running this model 
normally without profiling works.
   - If the model have even more nodes, running this model normally will fail 
either with same error.
   
   - The cause should be generally same as 
https://github.com/apache/tvm/issues/16276
   
   ### Expected behavior
   
   Models running normally
   
   ### Actual behavior
   
   Models stuck when profile...
   ... or run with CL_OUT_OF_HOST_MEMORY error.
   
   ```
   2024-02-07 02:39:55.381 INFO load_module /tmp/tmp86s5qxl8/mod.so
   arm_release_ver: g13p0-01eac0, rk_so_ver: 6
   Traceback (most recent call last):
     File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
       return _run_code(code, main_globals, None,
     File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
       exec(code, run_globals)
     File "/home/firefly/tvm/python/tvm/driver/tvmc/__main__.py", line 24, in 
<module>
       tvmc.main.main()
     File "/home/firefly/tvm/python/tvm/driver/tvmc/main.py", line 118, in main
       sys.exit(_main(sys.argv[1:]))
     File "/home/firefly/tvm/python/tvm/driver/tvmc/main.py", line 106, in _main
       return args.func(args)
     File "/home/firefly/tvm/python/tvm/driver/tvmc/runner.py", line 282, in 
drive_run
       result = run_module(
     File "/home/firefly/tvm/python/tvm/driver/tvmc/runner.py", line 706, in 
run_module
       times = module.benchmark(dev, number=number, repeat=repeat, 
end_to_end=end_to_end)
     File "/home/firefly/tvm/python/tvm/contrib/graph_executor.py", line 505, 
in benchmark
       return self.module.time_evaluator(
     File "/home/firefly/tvm/python/tvm/runtime/module.py", line 397, in 
evaluator
       blob = feval(*args)
     File "/home/firefly/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, 
in __call__
       raise_last_ffi_error()
     File "/home/firefly/tvm/python/tvm/_ffi/base.py", line 481, in 
raise_last_ffi_error
       raise py_err
     File "/home/firefly/tvm/src/runtime/rpc/rpc_module.cc", line 291, in 
tvm::runtime::RPCWrappedFunc::WrapRemoteReturnToValue(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const::$_0::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
       *rv = PackedFunc([wf](TVMArgs args, TVMRetValue* rv) { return 
wf->operator()(args, rv); });
     File "/home/firefly/tvm/src/runtime/rpc/rpc_module.cc", line 129, in 
tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
       sess_->CallFunc(handle_, values.data(), type_codes.data(), args.size(), 
set_return);
     File "/home/firefly/tvm/src/runtime/rpc/rpc_local_session.cc", line 91, in 
tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, 
std::function<void (tvm::runtime::TVMArgs)> const&)
       pf->CallPacked(TVMArgs(arg_values, arg_type_codes, num_args), &rv);
     File "/home/firefly/tvm/src/runtime/profiling.cc", line 888, in 
tvm::runtime::profiling::WrapTimeEvaluator(tvm::runtime::PackedFunc, DLDevice, 
int, int, int, int, int, int, int, 
tvm::runtime::PackedFunc)::$_0::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*)
       DeviceAPI::Get(dev)->StreamSync(dev, nullptr);
     File "/home/firefly/tvm/src/runtime/opencl/opencl_device_api.cc", line 
387, in tvm::runtime::cl::OpenCLWorkspace::StreamSync(DLDevice, void*)
       OPENCL_CALL(clFinish(this->GetQueue(dev)));
   tvm.error.InternalError: Traceback (most recent call last):
     4: 
tvm::runtime::RPCWrappedFunc::WrapRemoteReturnToValue(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const::$_0::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
           at /home/firefly/tvm/src/runtime/rpc/rpc_module.cc:291
     3: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
           at /home/firefly/tvm/src/runtime/rpc/rpc_module.cc:129
     2: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int 
const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
           at /home/firefly/tvm/src/runtime/rpc/rpc_local_session.cc:91
     1: tvm::runtime::profiling::WrapTimeEvaluator(tvm::runtime::PackedFunc, 
DLDevice, int, int, int, int, int, int, int, 
tvm::runtime::PackedFunc)::$_0::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*)
           at /home/firefly/tvm/src/runtime/profiling.cc:888
     0: tvm::runtime::cl::OpenCLWorkspace::StreamSync(DLDevice, void*)
           at /home/firefly/tvm/src/runtime/opencl/opencl_device_api.cc:387
     File "/home/firefly/tvm/src/runtime/opencl/opencl_device_api.cc", line 387
   InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, 
code=-6: CL_OUT_OF_HOST_MEMORY
   ```
   
   ### Environment
   
   RK3588 SoC with Mali-G610 MP4 GPU
   ARM vendor GPU driver, OpenCL 3.0
   Debian 11
   TVM master branch
   
   ### Steps to reproduce
   
   
   
   - Create a model with many nodes: (I does not know how to use the TVM Relay 
graph API now, and sadly ChatGPT does not either. so use ONNX)
   ```python
   import onnx
   from onnx import helper, TensorProto, numpy_helper
   import numpy
   
   cnt = 250
   
   # Create input tensor
   input_shape = [1, 1024]
   input_name = "input"
   input_tensor = helper.make_tensor_value_info(input_name, TensorProto.FLOAT, 
input_shape)
   
   # Create constant tensor
   constant_shape = [1, 1024]
   initializer = helper.make_tensor("constant", TensorProto.FLOAT, 
constant_shape, numpy.ones(constant_shape))
   
   # Create vector add nodes
   nodes = []
   for i in range(cnt):
       node_name = f"add_{i}"
       node = helper.make_node("Add", inputs=[input_name, "constant"], 
outputs=[node_name], name=node_name)
       input_name = node_name
       nodes.append(node)
       # add a identity node to avoid the OOM
       node_name = f"identity_{i}"
       node = helper.make_node("Identity", inputs=[input_name], 
outputs=[node_name], name=node_name)
       input_name = node_name
   
       nodes.append(node)
   
   # Create output tensor
   output_name = nodes[-1].output[0]
   output_tensor = helper.make_tensor_value_info(output_name, 
TensorProto.FLOAT, input_shape)
   
   # Set opset version to 16
   opset_version = 16
   opset_imports = [helper.make_opsetid("", opset_version)]
   
   # Create ONNX model
   model = helper.make_model(
       helper.make_graph(
           nodes,
           "vector_add_model",
           [input_tensor],
           [output_tensor],
           initializer=[initializer]
       ),
       producer_name="vector_add_model",
       opset_imports=opset_imports
   )
   
   # Save ONNX model to a file
   onnx.save(model, "vector_add_model(" + str(cnt*2) + " nodes).onnx")
   ```
   (or just download 
[vector_add_model.zip](https://github.com/apache/tvm/files/14184798/vector_add_model.zip))
   - Compile and profile run the model with 500 nodes:
   `tvmc compile  --target "opencl -device=mali" --output test500.tar -O 0 
--dump-code relay,tir "vector_add_model(500 nodes).onnx"`
   `python -m tvm.driver.tvmc run       --print-time   --device cl      
--repeat 4 --profile      ./test500.tar`
   (it works)
   - try again with 2000 nodes: stuck with --profile, but runs okay
   - try again with 5000 nodes: stuck with --profile and run failed
   
   ### Triage
   
   * backend: opencl
   * flow: graph
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to