trevor-m opened a new issue #6673: URL: https://github.com/apache/incubator-tvm/issues/6673
This issue came up for this PR to add TRT BYOC support: https://github.com/apache/incubator-tvm/pull/6395#issuecomment-707363920 Example failed CI run: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-6395/30/pipeline It seems that enabling TRT BYOC codegen (`set(USE_TENSORRT_CODEGEN ON)`) exposed an unrelated bug found by `apps/bundle_deploy/bundle_deploy.py` during `tests/scro[ts/task_cpp_unittest.sh`. The python program segfaults when ran. We believe the issue is not with this test itself, but it just happens to be the first thing that runs a TVM python session and quits after building TVM. I reproduced the error inside of GDB to get the backtrace. ``` Thread 1 "python3" received signal SIGSEGV, Segmentation fault. __GI___libc_free (mem=0x6) at malloc.c:2958 2958 malloc.c: No such file or directory. (gdb) bt #0 __GI___libc_free (mem=0x6) at malloc.c:2958 #1 0x00007fffde4937f4 in dmlc::parameter::FieldAccessEntry::~FieldAccessEntry() () from /workspace/build/libtvm.so #2 0x00007fff9702a4af in dmlc::parameter::FieldEntry<std::string>::~FieldEntry() () from /usr/local/lib/python3.6/dist-packages/xgboost/./lib/libxgboost.so #3 0x00007fff97037267 in dmlc::parameter::ParamManager::~ParamManager() () from /usr/local/lib/python3.6/dist-packages/xgboost/./lib/libxgboost.so #4 0x00007ffff6cd7008 in __run_exit_handlers (status=0, listp=0x7ffff70615f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82 #5 0x00007ffff6cd7055 in __GI_exit (status=<optimized out>) at exit.c:104 #6 0x00007ffff6cbd847 in __libc_start_main (main=0x4d1cb0 <main>, argc=5, argv=0x7fffffffe858, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe848) at ../csu/libc-start.c:325 #7 0x00000000005e8569 in _start () ``` Since I noticed that [TVM's setup.py requires a minimum XGBoost version of 1.1.0](https://github.com/apache/incubator-tvm/blob/main/python/setup.py#L172), I noticed the CI docker only has 1.02. I tried 1.1.0 and 1.2.0 and found that both fixed the issue. ``` ubuntu@ip-172-31-83-183:~/apps/bundle_deploy$ python3 build_model.py -o build --test INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) Segmentation fault (core dumped) ubuntu@ip-172-31-83-183:~/apps/bundle_deploy$ python3 Python 3.6.10 (default, Dec 19 2019, 23:04:32) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import xgboost >>> xgboost.__version__ '1.0.2' >>> exit() ubuntu@ip-172-31-83-183:~/apps/bundle_deploy$ pip3 install --user xgboost==1.1.0 ubuntu@ip-172-31-83-183:~/apps/bundle_deploy$ python3 build_model.py -o build --test INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) ubuntu@ip-172-31-83-183:~/apps/bundle_deploy$ pip3 install --user xgboost==1.2.0 ubuntu@ip-172-31-83-183:~/apps/bundle_deploy$ python3 build_model.py -o build --test INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) ``` The issue looks similar to this one: https://discuss.xgboost.ai/t/segfault-during-code-cleanup/1365/6 I have encountered this exact error when using TVM with an xgboost or Treelite in the same program when the dmlc-core commits do not match up. It looks maybe like dmlc-core should maybe check for nullptr before deleting the entries? So it looks like we can fix this by upgrading the xgboost version in the CI containers. It would be good to make the xgboost version consistent with the minimum version 1.1.0 in the `setup.py`. However, it seems like dmlc-core has some buggy behavior which won't be completely fixed. @areusch @comaniac @zhiics @tqchen ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
