PengYoun9 opened a new issue, #17546: URL: https://github.com/apache/tvm/issues/17546
### Expected behavior This model should run successfully and the generated engine file should be close to the actual size of the model. ### Actual behavior When loading the model using the torch front-end, the engine file generated after compilation is too large, and core dump will occur when loading and executing from the local machine. The onnx front-end does not have this problem. ### Environment OS: "Ubuntu 20.04.6 LTS" CUDA SDK version: 12.2 TVM version: 7ae7ea836169d3cf28b05c7d0dd2cb6a2045508e GPU: NVIDIA A10 24GB Driver Version: 535.129.03 CUDA Version: 12.2 Torch Version: 2.2.1 Torchvision Version: 0.17.1 Onnx Version: 1.15.0  ### Steps to reproduce I use the following code to get the engine: ` import tvm import torch import torchvision from tvm import relay model = torchvision.models.convnext_base(weights=True) model.eval() dummy_input = torch.randn(2, 3, 224, 224) trace_model = torch.jit.trace(model, dummy_input).eval() mod, params = relay.frontend.from_pytorch(trace_model, [('input', (2, 3, 224, 224))]) target = tvm.target.cuda() with tvm.transform.PassContext(): lib = relay.build(mod, target, params=params) lib.export_library("convnext.so")` and when i try to load this engine with: `module = tvm.runtime.load_module("convnext.so")` **Core Dump!** I noticed that the size of the Engine file is: 3.8G, But the original model size is: 339MB. And I remember that there was a limitation that models larger than 3GB could not be loaded before. I guess this might be the reason. Strangely, when I export the model to ONNX, I don't encounter similar issues, and the generated engine file size is 340MB: ` import tvm import torch import torchvision import onnx from tvm import relay onnx_path = "convnext.onnx" model = torchvision.models.convnext_base(weights=True) model.eval() input_names = ['input'] output_names = ['output'] dummy_input = torch.randn(2, 3, 224, 224) torch.onnx.export( model, dummy_input, onnx_path, input_names = input_names, output_names = output_names, opset_version=13 ) onnx_model = onnx.load(onnx_path) mod, params = relay.frontend.from_onnx(onnx_model, {'input':(2, 3, 224, 224)}) target = tvm.target.cuda() with tvm.transform.PassContext(): lib = relay.build(mod, target, params=params) lib.export_library("convnext_onnx.so") module = tvm.runtime.load_module("convnext_onnx.so") ` So I suspect that the Torch frontend might be incorrectly duplicating some constant weights when parsing certain Ops. ### Triage * needs-triage -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
