stdpain opened a new issue #5213:
URL: https://github.com/apache/incubator-doris/issues/5213


   **Describe the bug**
   BE may probabilistic trigger segmentfault when BE exit
   This bug will not affect the function, but it may increase the difficulty of 
subsequent troubleshooting such as heap-profile
   
   here is a coredump (master build with debug)
   
   ```
   Core was generated by `/home/users/stdpain/opt/doris-deploy/be/lib/palo_be'.
   Program terminated with signal SIGSEGV, Segmentation fault.
   b#0  0x00007ff97bb7d09c in 
__gnu_cxx::__normal_iterator<doris::TabletManager::tablets_shard*, 
std::vector<doris::TabletManager::tablets_shard, 
std::allocator<doris::TabletManager::tablets_shard> > >::__normal_iterator 
(this=0x7ff904d442b8, __i=<error reading variable>)
       at 
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_iterator.h:780
   780           : _M_current(__i) { }
   [Current thread is 1 (LWP 38702)]
   warning: File 
"/ssd1/opt/fenghaoasuch/workspace/doris/workspace/doris-toolchain/gcc730/lib64/libstdc++.so.6.0.24-gdb.py"
 auto-loading has been declined by your `auto-load safe-path' set to 
"$debugdir:$datadir/auto-load".
   (gdb) bt
   #0  0x00007ff97bb7d09c in 
__gnu_cxx::__normal_iterator<doris::TabletManager::tablets_shard*, 
std::vector<doris::TabletManager::tablets_shard, 
std::allocator<doris::TabletManager::tablets_shard> > >::__normal_iterator 
(this=0x7ff904d442b8, __i=<error reading variable>)
       at 
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_iterator.h:780
   #1  0x00007ff97bb7b3df in std::vector<doris::TabletManager::tablets_shard, 
std::allocator<doris::TabletManager::tablets_shard> >::begin (this=0x8)
       at 
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_vector.h:564
   #2  0x00007ff97bb6ed37 in 
doris::TabletManager::find_best_tablet_to_compaction (this=0x0,
       compaction_type=doris::CUMULATIVE_COMPACTION, data_dir=0x55fca00,
       tablet_submitted_compaction=std::vector of length 0, capacity 0)
       at /home/users/stdpain/doris/core/be/src/olap/tablet_manager.cpp:681
   #3  0x00007ff97ba76a83 in doris::StorageEngine::_compaction_tasks_generator 
(this=0x558cc00,
       compaction_type=doris::CUMULATIVE_COMPACTION,
       data_dirs=std::vector of length 1, capacity 1 = {...})
       at /home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:397
   #4  0x00007ff97ba764d5 in 
doris::StorageEngine::_compaction_tasks_producer_callback (this=0x558cc00)
       at /home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:337
   #5  0x00007ff97ba73d39 in doris::StorageEngine::<lambda()>::operator()(void) 
const (
       __closure=0x6fb8f18) at 
/home/users/stdpain/doris/core/be/src/olap/olap_server.cpp:78
   #6  0x00007ff97ba77ae1 in std::_Function_handler<void(), 
doris::StorageEngine::start_bg_threads()::<lambda()> >::_M_invoke(const 
std::_Any_data &) (__functor=...)
       at 
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/std_function.h:316
   #7  0x00007ff97cf14b7c in std::function<void ()>::operator()() const 
(this=0x6fb8f18)
       at 
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/std_function.h:706
   #8  0x00007ff97a7163ce in doris::Thread::supervise_thread (arg=0x6fb8f00)
       at /home/users/stdpain/doris/core/be/src/util/thread.cpp:386
   #9  0x00007ff978cd21c3 in start_thread () from 
/opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
   #10 0x00007ff9782f512d in clone () from 
/opt/compiler/gcc-4.8.2/lib64/libc.so.6
   ```
   
   Here was be.out when rebuild with ASAN:
   ```
   =================================================================
   ==54102==ERROR: AddressSanitizer: heap-use-after-free on address 
0x6190000cddc8 at pc 0x000001d36929 bp 0x7fcbbb572b70 sp 0x7fcbbb572b68
   READ of size 8 at 0x6190000cddc8 thread T233 (compaction_task)
       #0 0x1d36928 in std::_Rb_tree<doris::DataDir*, std::pair<doris::DataDir* 
const, std::vector<long, std::allocator<long> > >, 
std::_Select1st<std::pair<doris::DataDir* const, std::vector<long, 
std::allocator<long> > > >, std::less<doris::DataDir*>, 
std::allocator<std::pair<doris::DataDir* const, std::vector<long, 
std::allocator<long> > > > >::_M_begin() 
/ssd1/opt/stdpain/workspace/doris/workspace/doris-toolchain/gcc730/include/c++/7.3.0/bits/stl_tree.h:737
       ...
   ```
   
   **To Reproduce**
   It's hard to reproduce the bug... but I found a way to stabilize the 
recurrence problem ....
   
   we could modify be/service/doris_main.cpp:
   ```
       heartbeat_thrift_server = nullptr;
       sleep(20); // modify here
       doris::ExecEnv::destroy(exec_env);
       return 0;
   ```
   
   1. exec ./bin/start_be.sh
   2. kill be
   
   It seems that when StorageEngine is deleted , but the bachground thread is 
still runting, when background thread try to access StorageEngine ... BE will 
crash
   
   
   
   **Expected behavior**
   BE shouldn't exit with segmentfault,
   
   
   **Desktop (please complete the following information):**
   
    - OS: CentOS 6
   
   ** Some Solution **
    make StorageEngine extends shared_from_this
    or
    wait backgroud exit before StorageEngine destroyed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to