stdpain opened a new issue #5213:
URL: https://github.com/apache/incubator-doris/issues/5213
**Describe the bug**
BE may probabilistic trigger segmentfault when BE exit
This bug will not affect the function, but it may increase the difficulty of
subsequent troubleshooting such as heap-profile
here is a coredump (master build with debug)
```
Core was generated by `/home/users/stdpain/opt/doris-deploy/be/lib/palo_be'.
Program terminated with signal SIGSEGV, Segmentation fault.
b#0 0x00007ff97bb7d09c in
__gnu_cxx::__normal_iterator<doris::TabletManager::tablets_shard*,
std::vector<doris::TabletManager::tablets_shard,
std::allocator<doris::TabletManager::tablets_shard> > >::__normal_iterator
(this=0x7ff904d442b8, __i=<error reading variable>)
at
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_iterator.h:780
780 : _M_current(__i) { }
[Current thread is 1 (LWP 38702)]
warning: File
"/ssd1/opt/fenghaoasuch/workspace/doris/workspace/doris-toolchain/gcc730/lib64/libstdc++.so.6.0.24-gdb.py"
auto-loading has been declined by your `auto-load safe-path' set to
"$debugdir:$datadir/auto-load".
(gdb) bt
#0 0x00007ff97bb7d09c in
__gnu_cxx::__normal_iterator<doris::TabletManager::tablets_shard*,
std::vector<doris::TabletManager::tablets_shard,
std::allocator<doris::TabletManager::tablets_shard> > >::__normal_iterator
(this=0x7ff904d442b8, __i=<error reading variable>)
at
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_iterator.h:780
#1 0x00007ff97bb7b3df in std::vector<doris::TabletManager::tablets_shard,
std::allocator<doris::TabletManager::tablets_shard> >::begin (this=0x8)
at
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_vector.h:564
#2 0x00007ff97bb6ed37 in
doris::TabletManager::find_best_tablet_to_compaction (this=0x0,
compaction_type=doris::CUMULATIVE_COMPACTION, data_dir=0x55fca00,
tablet_submitted_compaction=std::vector of length 0, capacity 0)
at /home/users/stdpain/doris/core/be/src/olap/tablet_manager.cpp:681
#3 0x00007ff97ba76a83 in doris::StorageEngine::_compaction_tasks_generator
(this=0x558cc00,
compaction_type=doris::CUMULATIVE_COMPACTION,
data_dirs=std::vector of length 1, capacity 1 = {...})
at /home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:397
#4 0x00007ff97ba764d5 in
doris::StorageEngine::_compaction_tasks_producer_callback (this=0x558cc00)
at /home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:337
#5 0x00007ff97ba73d39 in doris::StorageEngine::<lambda()>::operator()(void)
const (
__closure=0x6fb8f18) at
/home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:78
#6 0x00007ff97ba77ae1 in std::_Function_handler<void(),
doris::StorageEngine::start_bg_threads()::<lambda()> >::_M_invoke(const
std::_Any_data &) (__functor=...)
at
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/std_function.h:316
#7 0x00007ff97cf14b7c in std::function<void ()>::operator()() const
(this=0x6fb8f18)
at
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/std_function.h:706
#8 0x00007ff97a7163ce in doris::Thread::supervise_thread (arg=0x6fb8f00)
at /home/users/stdpain/doris/core/be/src/util/thread.cpp:386
#9 0x00007ff978cd21c3 in start_thread () from
/opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
#10 0x00007ff9782f512d in clone () from
/opt/compiler/gcc-4.8.2/lib64/libc.so.6
```
Here was be.out when rebuild with ASAN:
```
=================================================================
==54102==ERROR: AddressSanitizer: heap-use-after-free on address
0x6190000cddc8 at pc 0x000001d36929 bp 0x7fcbbb572b70 sp 0x7fcbbb572b68
READ of size 8 at 0x6190000cddc8 thread T233 (compaction_task)
#0 0x1d36928 in std::_Rb_tree<doris::DataDir*, std::pair<doris::DataDir*
const, std::vector<long, std::allocator<long> > >,
std::_Select1st<std::pair<doris::DataDir* const, std::vector<long,
std::allocator<long> > > >, std::less<doris::DataDir*>,
std::allocator<std::pair<doris::DataDir* const, std::vector<long,
std::allocator<long> > > > >::_M_begin()
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_tree.h:737
...
```
**To Reproduce**
It's hard to reproduce the bug... but I found a way to stabilize the
recurrence problem ....
we could modify be/service/doris_main.cpp:
```
heartbeat_thrift_server = nullptr;
sleep(20); // modify here
doris::ExecEnv::destroy(exec_env);
return 0;
```
1. exec ./bin/start_be.sh
2. kill be
It seems that when StorageEngine is deleted , but the bachground thread is
still runting, when background thread try to access StorageEngine ... BE will
crash
**Expected behavior**
BE shouldn't exit with segmentfault,
**Desktop (please complete the following information):**
- OS: CentOS 6
** Some Solution **
make StorageEngine extends shared_from_this
or
wait backgroud exit before StorageEngine destroyed
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]