echuraev commented on PR #17190:
URL: https://github.com/apache/tvm/pull/17190#issuecomment-2246260177

   @tqchen thank you for your review! I'll work on your comments tomorrow. 
About the test case. Originally, problem appeared in disco. When we run model 
on multiple GPU and TVM was compiled with cython, then we have such crash:
   ```
   Traceback (most recent call last):
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/engine/staging_engine_worker.py",
 line 421, in run_generation_loop_worker
       model_module = model_module_loader(**model_module_loader_kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/paged_cache_model.py",
 line 136, in __init__
       model, cache_manager = init_tvm_model(model_artifact_config, 
engine_config)
                              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model.py",
 line 922, in init_tvm_model
       model = Model(model_artifact_config, dev)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model.py",
 line 112, in __init__
       self.mod, self.params, self.disco_session = get_tvm_model(config, dev)
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model_utils.py",
 line 275, in get_tvm_model
       return load_disco_module(config.model_artifact_path, lib_path, 
config.num_shards)
              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model_utils.py",
 line 93, in load_disco_module
       module, params = load_disco_weights_through_shard_loader(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model_utils.py",
 line 39, in load_disco_weights_through_shard_loader
       module = sess.load_vm_module(lib_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/deps/tvm/python/tvm/runtime/disco/session.py",
 line 309, in load_vm_module
       return DModule(func(path, device, alloc_type), self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/deps/tvm/python/tvm/runtime/disco/session.py",
 line 94, in __init__
       del dref.handle
           ^^^^^^^^^^^
   NotImplementedError: __del__
   ```
   
   The crash happen 
[here](https://github.com/apache/tvm/blob/main/python/tvm/runtime/disco/session.py#L95)
 when we try to delete `dref.handle`.
   
   I simplified the problem and we can create a simple `test.pyx` without any 
TVM specific:
   ```python
   cdef class TestClass:
       property handle:
           def __get__(self):
               print("Getter")
               return None
   
           def __set__(self, value):
               print("Setter")
               return
   ```
   The property in the code above has the same methods as exists in 
[packed_func.pxi](https://github.com/apache/tvm/blob/main/python/tvm/_ffi/_cython/packed_func.pxi#L311-L318).
   
   Next, create a `setup.py` file: 
   ```python
   from setuptools import setup
   from Cython.Build import cythonize
   
   setup(
       ext_modules=cythonize("test.pyx"),
   )
   ```
   
   And build it by the following command:
   ```bash
   python setup.py build_ext --inplace
   ```
   
   After that, we can reproduce original issue: 
   ```python
   $ python3
   >>> import test
   >>> t = test.TestClass()
   >>> del t.handle
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   NotImplementedError: __del__
   >>>
   ```
   
   If we add an implementation of `__del__` method to `test.pyx` than we won't 
have this exception:
   - Add this method to `TestClass`:
   ```
           def __del__(self):
               print("Dtor")
               return
   ```
   - Recompile the code and run the reproducer once again:
   ```python
   $ python3
   >>> import test
   >>> t = test.TestClass()
   >>> del t.handle
   Dtor
   >>>
   ```
   
   And after adding implementation of `__del__` methods to the ffi files, the 
crash in disco also was fixed. If it is necessary, I can try to create a simple 
unit test tomorrow, but not sure how to integrate cython tests to the TVM 
infrastructure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to