xy720 opened a new pull request, #24840:
URL: https://github.com/apache/doris/pull/24840

   ## Proposed changes
   
   In version 1.2.7, we found that the be core occurs about once a month. The 
core dump is:
   
   ```
   Program terminated with signal SIGSEGV, Segmentation fault.
   #0  0x00007fb96ac13d00 in pthread_mutex_lock () from /lib64/libpthread.so.0
   #1  0x000055a571705bf6 in __gthread_mutex_lock (__mutex=0x725f74756f656dd1)
       at 
/var/local/ldb-toolchain/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749
   #2  std::mutex::lock (this=0x725f74756f656dd1) at 
/var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:100
   #3  std::unique_lock<std::mutex>::lock (this=0x7fb87c7efc00) at 
/var/local/ldb-toolchain/include/c++/11/bits/unique_lock.h:139
   #4  std::unique_lock<std::mutex>::unique_lock (__m=..., this=0x7fb87c7efc00)
       at /var/local/ldb-toolchain/include/c++/11/bits/unique_lock.h:69
   #5  doris::BlockingQueue<doris::vectorized::ScannerContext*>::blocking_put 
(val=<synthetic pointer>: <optimized out>, 
       this=0x725f74756f656d69) at 
/data/doris-1.x/be/src/util/blocking_queue.hpp:103
   #6  doris::vectorized::ScannerScheduler::submit (this=<optimized out>, 
ctx=0x7fb470295e00)
       at /data/doris-1.x/be/src/vec/exec/scan/scanner_scheduler.cpp:100
   #7  0x000055a5716c9360 in doris::vectorized::VScanNode::_start_scanners 
(this=this@entry=0x7fb389e17000, scanners=...)
       at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
   #8  0x000055a5716d9e53 in doris::vectorized::VScanNode::open 
(this=0x7fb389e17000, state=<optimized out>)
       at /data/doris-1.x/be/src/vec/exec/scan/vscan_node.cpp:102
   #9  0x000055a56c777eef in 
doris::PlanFragmentExecutor::open_vectorized_internal 
(this=this@entry=0x7fb3e9c42680)
       at /var/local/ldb-toolchain/include/c++/11/bits/unique_ptr.h:421
   #10 0x000055a56c7795ae in doris::PlanFragmentExecutor::open 
(this=this@entry=0x7fb3e9c42680)
       at /data/doris-1.x/be/src/runtime/plan_fragment_executor.cpp:261
   #11 0x000055a56c751c79 in doris::FragmentExecState::execute 
(this=0x7fb3e9c42600)
       at /data/doris-1.x/be/src/runtime/fragment_mgr.cpp:260
   #12 0x000055a56c7552ce in 
doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, 
std::function<void (doris::PlanFragmentExecutor*)>) 
(this=this@entry=0x7fb8d0e25000, exec_state=..., cb=...)
       at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
   #13 0x000055a56c755802 in operator() (__closure=<optimized out>) at 
/var/local/ldb-toolchain/include/c++/11/ext/atomicity.h:109
   #14 std::__invoke_impl<void, doris::FragmentMgr::exec_plan_fragment(const 
doris::TExecPlanFragmentParams&, 
doris::FragmentMgr::FinishCallback)::<lambda()>&> (__f=...) at 
/var/local/ldb-toolchain/include/c++/11/bits/invoke.h:61
   #15 std::__invoke_r<void, doris::FragmentMgr::exec_plan_fragment(const 
doris::TExecPlanFragmentParams&, 
doris::FragmentMgr::FinishCallback)::<lambda()>&> (__fn=...) at 
/var/local/ldb-toolchain/include/c++/11/bits/invoke.h:111
   #16 std::_Function_handler<void(), 
doris::FragmentMgr::exec_plan_fragment(const doris::TExecPlanFragmentParams&, 
doris::FragmentMgr::FinishCallback)::<lambda()> >::_M_invoke(const 
std::_Any_data &) (__functor=...)
       at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:291
   #17 0x000055a56ca0d475 in std::function<void ()>::operator()() const 
(this=<optimized out>)
       at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:556
   #18 doris::FunctionRunnable::run (this=<optimized out>) at 
/data/doris-1.x/be/src/util/threadpool.cpp:46
   #19 doris::ThreadPool::dispatch_thread (this=0x7fb933ceea80) at 
/data/doris-1.x/be/src/util/threadpool.cpp:535
   #20 0x000055a56ca028cf in std::function<void ()>::operator()() const 
(this=0x7faf3395f078)
       at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:556
   #21 doris::Thread::supervise_thread (arg=0x7faf3395f060) at 
/data/doris-1.x/be/src/util/thread.cpp:454
   #22 0x00007fb96ac11ea5 in start_thread () from /lib64/libpthread.so.0
   --Type <RET> for more, q to quit, c to continue without paging--
   #23 0x00007fb96af249fd in clone () from /lib64/libc.so.6
   ```
   
   The program is segmentation fault at memory address 0x725f74756f656dd1.
   
   ```
   (gdb) x 0x725f74756f656dd1
   0x725f74756f656dd1:     Cannot access memory at address 0x725f74756f656dd1
   ```
   
   Actually this address is not a mutex address, but a wrong address causes by 
memory out of bound.
   
   Here is the prove:
   
   ```
   (gdb) f 6                                                                
   #6  doris::vectorized::ScannerScheduler::submit (this=<optimized out>, 
ctx=0x7fb470295e00)
       at /data/doris-1.x/be/src/vec/exec/scan/scanner_scheduler.cpp:100
   100         if (!_pending_queues[ctx->queue_idx]->blocking_put(ctx)) {
   (gdb) p ctx->queue_idx                                                   
   $2 = -3
   (gdb) f 7                                                                
   #7  0x000055a5716c9360 in doris::vectorized::VScanNode::_start_scanners 
(this=this@entry=0x7fb389e17000, scanners=...)
       at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
   1290    /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h: No 
such file or directory.
   (gdb) p _state->_exec_env->_scanner_scheduler->_pending_queues[-3]->_lock
   Cannot access memory at address 0x725f74756f656dd1
   ```
   
   The program access to address 0x725f74756f656dd1 because the queue_idx is 
set to -3.
   
   And the reason is the atomic_int  _queue_idx in ScannerScheduler is 
increment overflow to be negative.
   
   So we should use atomic_uint to avoid overflow behavior.
   
   <!--Describe your changes.-->
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[[email protected]](mailto:[email protected]) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to