PengYoun9 opened a new issue, #17546:
URL: https://github.com/apache/tvm/issues/17546

   ### Expected behavior
   
   This model should run successfully and the generated engine file should be 
close to the actual size of the model.
   
   ### Actual behavior
   
   When loading the model using the torch front-end, the engine file generated 
after compilation is too large, and core dump will occur when loading and 
executing from the local machine. The onnx front-end does not have this problem.
   
   ### Environment
   
   OS: "Ubuntu 20.04.6 LTS"
   CUDA SDK version: 12.2
   TVM version: 7ae7ea836169d3cf28b05c7d0dd2cb6a2045508e
   GPU: NVIDIA A10 24GB
   Driver Version: 535.129.03
   CUDA Version: 12.2
   Torch Version: 2.2.1
   Torchvision Version: 0.17.1
   Onnx Version: 1.15.0
   
   
![error](https://github.com/user-attachments/assets/cb8c3538-7705-467c-a572-a2c8d3533e37)
   
   ### Steps to reproduce
   I use the following code to get the engine:
   
   `
   import tvm
   import torch
   import torchvision
   from tvm import relay
   
   model = torchvision.models.convnext_base(weights=True)
   model.eval()
   
   dummy_input = torch.randn(2, 3, 224, 224)
   trace_model = torch.jit.trace(model, dummy_input).eval()
   
   mod, params = relay.frontend.from_pytorch(trace_model, [('input', (2, 3, 
224, 224))])
   
   target = tvm.target.cuda()
   
   with tvm.transform.PassContext():
       lib = relay.build(mod, target, params=params)
   
   lib.export_library("convnext.so")`
   
   and when i try to load this engine with:
   
   `module = tvm.runtime.load_module("convnext.so")`
   
   **Core Dump!**
   
   I noticed that the size of the Engine file is: 3.8G, But the original model 
size is: 339MB. And I remember that there was a limitation that models larger 
than 3GB could not be loaded before. I guess this might be the reason.
   
   Strangely, when I export the model to ONNX, I don't encounter similar 
issues, and the generated engine file size is 340MB:
   `
   import tvm
   import torch
   import torchvision
   import onnx
   from tvm import relay
   
   onnx_path = "convnext.onnx"
   
   model = torchvision.models.convnext_base(weights=True)
   model.eval()
   
   input_names = ['input']
   output_names = ['output']
   dummy_input = torch.randn(2, 3, 224, 224)
   
   torch.onnx.export(
       model,
       dummy_input,
       onnx_path,
       input_names = input_names,
       output_names = output_names,
       opset_version=13
   )
   
   onnx_model = onnx.load(onnx_path)
   
   mod, params = relay.frontend.from_onnx(onnx_model, {'input':(2, 3, 224, 
224)})
   
   target = tvm.target.cuda()
   
   with tvm.transform.PassContext():
       lib = relay.build(mod, target, params=params)
   
   lib.export_library("convnext_onnx.so")
   
   module = tvm.runtime.load_module("convnext_onnx.so")
   `
   
   So I suspect that the Torch frontend might be incorrectly duplicating some 
constant weights when parsing certain Ops.
   
   ### Triage
   
   * needs-triage
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to