Mryange opened a new pull request, #57888:
URL: https://github.com/apache/doris/pull/57888

   ### What problem does this PR solve?
   
   We previously had a crash. The cause is that we should not access the 
request after calling add_block(...) because add_block may enqueue a closure 
that runs on another thread and frees the request
   
   ```
   ==730145==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x7be1efd803a0 at pc 0x556b38d9d625 bp 0x7b16bf0193f0 sp 0x7b16bf0193e8
   READ of size 4 at 0x7be1efd803a0 thread T1559
       #0 0x556b38d9d624 in 
google::protobuf::internal::RepeatedPtrFieldBase::size() const 
/home/zcp/repo_center/doris_master/doris/thirdparty/installed/include/google/protobuf/repeated_ptr_field.h:185:29
       #1 0x556b408ab062 in 
google::protobuf::RepeatedPtrField<doris::PBlock>::size() const 
/home/zcp/repo_center/doris_master/doris/thirdparty/installed/include/google/protobuf/repeated_ptr_field.h:1248:32
       #2 0x556b408aaff4 in doris::PTransmitDataParams::_internal_blocks_size() 
const 
/home/zcp/repo_center/doris_master/doris/be/../gensrc/build/gen_cpp/internal_service.pb.h:32149:25
       #3 0x556b4089731c in doris::PTransmitDataParams::blocks_size() const 
/home/zcp/repo_center/doris_master/doris/be/../gensrc/build/gen_cpp/internal_service.pb.h:32152:10
       #4 0x556b60a83c17 in 
doris::vectorized::VDataStreamMgr::transmit_block(doris::PTransmitDataParams 
const*, google::protobuf::Closure**, long) 
/home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdata_stream_mgr.cpp:150:38
       #5 0x556b407f7408 in 
doris::PInternalService::_transmit_block(google::protobuf::RpcController*, 
doris::PTransmitDataParams const*, doris::PTransmitDataResult*, 
google::protobuf::Closure*, doris::Status const&, long) 
/home/zcp/repo_center/doris_master/doris/be/src/service/internal_service.cpp:1673:40
       #6 0x556b407f52bb in 
doris::PInternalService::transmit_block(google::protobuf::RpcController*, 
doris::PTransmitDataParams const*, doris::PTransmitDataResult*, 
google::protobuf::Closure*) 
/home/zcp/repo_center/doris_master/doris/be/src/service/internal_service.cpp:1610:9
       #7 0x556b43fceba2 in 
doris::PBackendService::CallMethod(google::protobuf::MethodDescriptor const*, 
google::protobuf::RpcController*, google::protobuf::Message const*, 
google::protobuf::Message*, google::protobuf::Closure*) 
/home/zcp/repo_center/doris_master/doris/gensrc/build/gen_cpp/internal_service.pb.cc:49452:7
       #8 0x556b6736273e in 
brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*) 
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770bd73e)
       #9 0x556b67357426 in brpc::ProcessInputMessage(void*) 
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770b2426)
       #10 0x556b67357f20 in 
brpc::InputMessenger::InputMessageClosure::~InputMessageClosure() 
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770b2f20)
       #11 0x556b673588dd in brpc::InputMessenger::OnNewMessages(brpc::Socket*) 
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x770b38dd)
       #12 0x556b674a0adc in brpc::Socket::ProcessEvent(void*) 
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x771fbadc)
       #13 0x556b672e0f76 in bthread::TaskGroup::task_runner(long) 
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x7703bf76)
       #14 0x556b672cbbe0 in bthread_make_fcontext 
(/mnt/hdd01/selectdb-cloud-chaos/cluster0/be/lib/doris_be+0x77026be0)
   ```
   
   ```
           Note: The done pointer will be saved in add_block and may be called 
in another thread via done->Run().
           For example, when blocks_size == 1, the process is as follows:
           transmit_block (i=0)
             └─> recvr->add_block(..., done, ...)  // Pass done
                  └─> SenderQueue::add_block
                       └─> _pending_closures.push(done)  // done is saved
   
           get_batch() [another thread]
             └─> closure_pair.first->Run()  // ⚠️ done->Run() is called
                  └─> brpc releases request and response
   
           transmit_block (i=1)  [original thread continues]
             └─> request->blocks_size()  // ⚠️ request has already been 
released!
   
           At this point, a use-after-free issue occurs.
   
           TODO: We should consider refactoring this part because add_block may 
release the request.
           We should not access the request after calling add_block.
   ```
   
   
   
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to