trevor-m edited a comment on pull request #6395: URL: https://github.com/apache/incubator-tvm/pull/6395#issuecomment-707363920
> @trevor-m I don't see anything weird with your build rules, but I wonder if changing the cmake config affected something. do you have >1 CI failure showing the segfault, or can you reproduce this locally? > > we have seen this before sporadically but don't know what causes it, and it's usually pretty hard to reproduce Yes, every CI run with USE_TENSORRT_CODEGEN ON got the segfault (there was at least 10 runs). I was able to reproduce this consistently locally now by replicating the same steps used by the CI (using docker image). It's the `apps/bundle_deploy/build_model.py` script which is segfaulting. I ran it using gdb inside the container. From `gdb -ex r --args python3 build_model.py -o build --test`. ``` Thread 1 "python3" received signal SIGSEGV, Segmentation fault. __GI___libc_free (mem=0x6) at malloc.c:2958 2958 malloc.c: No such file or directory. (gdb) bt #0 __GI___libc_free (mem=0x6) at malloc.c:2958 #1 0x00007fffde4937f4 in dmlc::parameter::FieldAccessEntry::~FieldAccessEntry() () from /workspace/build/libtvm.so #2 0x00007fff9702a4af in dmlc::parameter::FieldEntry<std::string>::~FieldEntry() () from /usr/local/lib/python3.6/dist-packages/xgboost/./lib/libxgboost.so #3 0x00007fff97037267 in dmlc::parameter::ParamManager::~ParamManager() () from /usr/local/lib/python3.6/dist-packages/xgboost/./lib/libxgboost.so #4 0x00007ffff6cd7008 in __run_exit_handlers (status=0, listp=0x7ffff70615f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82 #5 0x00007ffff6cd7055 in __GI_exit (status=<optimized out>) at exit.c:104 #6 0x00007ffff6cbd847 in __libc_start_main (main=0x4d1cb0 <main>, argc=5, argv=0x7fffffffe858, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe848) at ../csu/libc-start.c:325 #7 0x00000000005e8569 in _start () ``` I found that the issue is caused by the xgboost version in the CI docker is too old. It had 1.0.2. I upgrade to 1.2 and now the segfault is fixed. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
