vagetablechicken opened a new issue #4534:
URL: https://github.com/apache/incubator-doris/issues/4534


   **Describe the bug**
   ```
   (gdb) bt
   #0  0x0000000000ff2216 in doris::TabletsChannel::add_batch 
(this=this@entry=0x4d68d2c0, params=...)
       at /builds/olap/doris/be/src/runtime/tablets_channel.cpp:84
   #1  0x0000000000feed46 in doris::LoadChannel::add_batch 
(this=this@entry=0x345f62da00, request=..., 
tablet_vec=tablet_vec@entry=0xf5acf18)
       at /builds/olap/doris/be/src/runtime/load_channel.cpp:92
   #2  0x0000000000fea341 in doris::LoadChannelMgr::add_batch (this=0x8d7bbf00, 
request=..., tablet_vec=0xf5acf18,
       wait_lock_time_ns=wait_lock_time_ns@entry=0x7f6b4842b710) at 
/builds/olap/doris/be/src/runtime/load_channel_mgr.cpp:149
   #3  0x0000000001071c25 in 
doris::PInternalServiceImpl<palo::PInternalService>::tablet_writer_add_batch(google::protobuf::RpcController*,
 doris::PTabletWriterAddBatchRequest const*, 
doris::PTabletWriterAddBatchResult*, 
google::protobuf::Closure*)::{lambda()#1}::operator()() const (
       __closure=0xa90f1cc00) at 
/builds/olap/doris/be/src/service/internal_service.cpp:109
   #4  0x0000000000f854f5 in operator() (this=0x7f6b4842b7e8) at 
/var/local/thirdparty/installed/include/boost/function/function_template.hpp:759
   #5  doris::PriorityThreadPool::work_thread (this=0x1ba39210, 
thread_id=<optimized out>)
       at /builds/olap/doris/be/src/util/priority_thread_pool.hpp:138
   #6  0x0000000001a42abd in thread_proxy ()
   #7  0x00007f6c31981dc5 in start_thread () from /lib64/libpthread.so.0
   #8  0x00007f6c31c8d73d in clone () from /lib64/libc.so.6
   (gdb) f 0
   (gdb) p _state
   $12 = doris::TabletsChannel::kInitialized
   ```
   Got Segment Fault here:
   
https://github.com/apache/incubator-doris/blob/068707484d16bbb26a7944baf89244ced07c2471/be/src/runtime/tablets_channel.cpp#L86
   
   gdb shows, TabletsChannel::_state is kinitialized, so it haven't called 
TabletsChannel::open() , but called TabletsChannel::add_batch(). So the 
`_next_seqs` didn't resized, it was 0-size.
   
https://github.com/apache/incubator-doris/blob/e71152132c460d3524a3b4b898b5fa10c5888523/be/src/runtime/tablets_channel.cpp#L54
   
   The root cause is this piece:
   
https://github.com/apache/incubator-doris/blob/10f822eb4353cbf9769d1ae6ada5632480d5f30e/be/src/runtime/load_channel.cpp#L44-L58
   One `open` request touched Line 54, _tablets_channels has already contained 
one tablet_channel(but is not opened yet), then it unlocked. One `add_batch` 
request is coming, it can lock and get this tablet_channel. But the `open` 
requset haven't touched Line 58. So the `add_batch` request want to access 
`_next_seqs[]`, then seg fault.
   
   **Expected behavior**
   `TabletsChannel::add_batch` only checks `kFinished`
   
https://github.com/apache/incubator-doris/blob/e71152132c460d3524a3b4b898b5fa10c5888523/be/src/runtime/tablets_channel.cpp#L83
   It should check `kOpened` too, e.g.
   ```
   if (_state != kOpened) {
       return Status::InternalError(strings::Substitute("tablet channel isn't 
opened, state={}", _state));
   }
   ```
   
   We can't do retry in the receiver side, so just return error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to