happyme531 opened a new issue, #16530: URL: https://github.com/apache/tvm/issues/16530
- If a Graph executor model have too many node, `tvmc run` with `--profile` option will hang, and if you trace its OpenCL call, it shows CL_OUT_OF_HOST_MEMORY error on the `clFlush` call after a lot of `clEnqueueNDRangeKernel` / `clSetKernelArg` calls. But running this model normally without profiling works. - If the model have even more nodes, running this model normally will fail either with same error. - The cause should be generally same as https://github.com/apache/tvm/issues/16276 ### Expected behavior Models running normally ### Actual behavior Models stuck when profile... ... or run with CL_OUT_OF_HOST_MEMORY error. ``` 2024-02-07 02:39:55.381 INFO load_module /tmp/tmp86s5qxl8/mod.so arm_release_ver: g13p0-01eac0, rk_so_ver: 6 Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/firefly/tvm/python/tvm/driver/tvmc/__main__.py", line 24, in <module> tvmc.main.main() File "/home/firefly/tvm/python/tvm/driver/tvmc/main.py", line 118, in main sys.exit(_main(sys.argv[1:])) File "/home/firefly/tvm/python/tvm/driver/tvmc/main.py", line 106, in _main return args.func(args) File "/home/firefly/tvm/python/tvm/driver/tvmc/runner.py", line 282, in drive_run result = run_module( File "/home/firefly/tvm/python/tvm/driver/tvmc/runner.py", line 706, in run_module times = module.benchmark(dev, number=number, repeat=repeat, end_to_end=end_to_end) File "/home/firefly/tvm/python/tvm/contrib/graph_executor.py", line 505, in benchmark return self.module.time_evaluator( File "/home/firefly/tvm/python/tvm/runtime/module.py", line 397, in evaluator blob = feval(*args) File "/home/firefly/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__ raise_last_ffi_error() File "/home/firefly/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err File "/home/firefly/tvm/src/runtime/rpc/rpc_module.cc", line 291, in tvm::runtime::RPCWrappedFunc::WrapRemoteReturnToValue(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const *rv = PackedFunc([wf](TVMArgs args, TVMRetValue* rv) { return wf->operator()(args, rv); }); File "/home/firefly/tvm/src/runtime/rpc/rpc_module.cc", line 129, in tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const sess_->CallFunc(handle_, values.data(), type_codes.data(), args.size(), set_return); File "/home/firefly/tvm/src/runtime/rpc/rpc_local_session.cc", line 91, in tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&) pf->CallPacked(TVMArgs(arg_values, arg_type_codes, num_args), &rv); File "/home/firefly/tvm/src/runtime/profiling.cc", line 888, in tvm::runtime::profiling::WrapTimeEvaluator(tvm::runtime::PackedFunc, DLDevice, int, int, int, int, int, int, int, tvm::runtime::PackedFunc)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) DeviceAPI::Get(dev)->StreamSync(dev, nullptr); File "/home/firefly/tvm/src/runtime/opencl/opencl_device_api.cc", line 387, in tvm::runtime::cl::OpenCLWorkspace::StreamSync(DLDevice, void*) OPENCL_CALL(clFinish(this->GetQueue(dev))); tvm.error.InternalError: Traceback (most recent call last): 4: tvm::runtime::RPCWrappedFunc::WrapRemoteReturnToValue(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const at /home/firefly/tvm/src/runtime/rpc/rpc_module.cc:291 3: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const at /home/firefly/tvm/src/runtime/rpc/rpc_module.cc:129 2: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&) at /home/firefly/tvm/src/runtime/rpc/rpc_local_session.cc:91 1: tvm::runtime::profiling::WrapTimeEvaluator(tvm::runtime::PackedFunc, DLDevice, int, int, int, int, int, int, int, tvm::runtime::PackedFunc)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) at /home/firefly/tvm/src/runtime/profiling.cc:888 0: tvm::runtime::cl::OpenCLWorkspace::StreamSync(DLDevice, void*) at /home/firefly/tvm/src/runtime/opencl/opencl_device_api.cc:387 File "/home/firefly/tvm/src/runtime/opencl/opencl_device_api.cc", line 387 InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-6: CL_OUT_OF_HOST_MEMORY ``` ### Environment RK3588 SoC with Mali-G610 MP4 GPU ARM vendor GPU driver, OpenCL 3.0 Debian 11 TVM master branch ### Steps to reproduce - Create a model with many nodes: (I does not know how to use the TVM Relay graph API now, and sadly ChatGPT does not either. so use ONNX) ```python import onnx from onnx import helper, TensorProto, numpy_helper import numpy cnt = 250 # Create input tensor input_shape = [1, 1024] input_name = "input" input_tensor = helper.make_tensor_value_info(input_name, TensorProto.FLOAT, input_shape) # Create constant tensor constant_shape = [1, 1024] initializer = helper.make_tensor("constant", TensorProto.FLOAT, constant_shape, numpy.ones(constant_shape)) # Create vector add nodes nodes = [] for i in range(cnt): node_name = f"add_{i}" node = helper.make_node("Add", inputs=[input_name, "constant"], outputs=[node_name], name=node_name) input_name = node_name nodes.append(node) # add a identity node to avoid the OOM node_name = f"identity_{i}" node = helper.make_node("Identity", inputs=[input_name], outputs=[node_name], name=node_name) input_name = node_name nodes.append(node) # Create output tensor output_name = nodes[-1].output[0] output_tensor = helper.make_tensor_value_info(output_name, TensorProto.FLOAT, input_shape) # Set opset version to 16 opset_version = 16 opset_imports = [helper.make_opsetid("", opset_version)] # Create ONNX model model = helper.make_model( helper.make_graph( nodes, "vector_add_model", [input_tensor], [output_tensor], initializer=[initializer] ), producer_name="vector_add_model", opset_imports=opset_imports ) # Save ONNX model to a file onnx.save(model, "vector_add_model(" + str(cnt*2) + " nodes).onnx") ``` (or just download [vector_add_model.zip](https://github.com/apache/tvm/files/14184798/vector_add_model.zip)) - Compile and profile run the model with 500 nodes: `tvmc compile --target "opencl -device=mali" --output test500.tar -O 0 --dump-code relay,tir "vector_add_model(500 nodes).onnx"` `python -m tvm.driver.tvmc run --print-time --device cl --repeat 4 --profile ./test500.tar` (it works) - try again with 2000 nodes: stuck with --profile, but runs okay - try again with 5000 nodes: stuck with --profile and run failed ### Triage * backend: opencl * flow: graph -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
