echuraev commented on PR #17190:
URL: https://github.com/apache/tvm/pull/17190#issuecomment-2246260177
@tqchen thank you for your review! I'll work on your comments tomorrow.
About the test case. Originally, problem appeared in disco. When we run model
on multiple GPU and TVM was compiled with cython, then we have such crash:
```
Traceback (most recent call last):
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/engine/staging_engine_worker.py",
line 421, in run_generation_loop_worker
model_module = model_module_loader(**model_module_loader_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/paged_cache_model.py",
line 136, in __init__
model, cache_manager = init_tvm_model(model_artifact_config,
engine_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model.py",
line 922, in init_tvm_model
model = Model(model_artifact_config, dev)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model.py",
line 112, in __init__
self.mod, self.params, self.disco_session = get_tvm_model(config, dev)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model_utils.py",
line 275, in get_tvm_model
return load_disco_module(config.model_artifact_path, lib_path,
config.num_shards)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model_utils.py",
line 93, in load_disco_module
module, params = load_disco_weights_through_shard_loader(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/mlc-serve/mlc_serve/model/tvm_model_utils.py",
line 39, in load_disco_weights_through_shard_loader
module = sess.load_vm_module(lib_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/deps/tvm/python/tvm/runtime/disco/session.py",
line 309, in load_vm_module
return DModule(func(path, device, alloc_type), self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/dlami/nvme/echuraev/Workspace/OctoML/ollm-lora/deps/tvm/python/tvm/runtime/disco/session.py",
line 94, in __init__
del dref.handle
^^^^^^^^^^^
NotImplementedError: __del__
```
The crash happen
[here](https://github.com/apache/tvm/blob/main/python/tvm/runtime/disco/session.py#L95)
when we try to delete `dref.handle`.
I simplified the problem and we can create a simple `test.pyx` without any
TVM specific:
```python
cdef class TestClass:
property handle:
def __get__(self):
print("Getter")
return None
def __set__(self, value):
print("Setter")
return
```
The property in the code above has the same methods as exists in
[packed_func.pxi](https://github.com/apache/tvm/blob/main/python/tvm/_ffi/_cython/packed_func.pxi#L311-L318).
Next, create a `setup.py` file:
```python
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize("test.pyx"),
)
```
And build it by the following command:
```bash
python setup.py build_ext --inplace
```
After that, we can reproduce original issue:
```python
$ python3
>>> import test
>>> t = test.TestClass()
>>> del t.handle
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: __del__
>>>
```
If we add an implementation of `__del__` method to `test.pyx` than we won't
have this exception:
- Add this method to `TestClass`:
```
def __del__(self):
print("Dtor")
return
```
- Recompile the code and run the reproducer once again:
```python
$ python3
>>> import test
>>> t = test.TestClass()
>>> del t.handle
Dtor
>>>
```
And after adding implementation of `__del__` methods to the ffi files, the
crash in disco also was fixed. If it is necessary, I can try to create a simple
unit test tomorrow, but not sure how to integrate cython tests to the TVM
infrastructure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]