walterzhaoJR opened a new issue, #3243:
URL: https://github.com/apache/brpc/issues/3243

   **Describe the bug**
   release_tls_block() and release_tls_block_chain() in the IOBuf TLS block 
caching layer do not guard against a block being returned to TLS when it is 
already the TLS list head. This can create a self-referencing cycle 
(b->portal_next == b), causing any subsequent traversal of the TLS chain — such 
as remove_tls_block_chain() (registered via thread_atexit) or share_tls_block() 
— to loop infinitely, hanging the thread permanently.
   
   In src/butil/iobuf_inl.h, release_tls_block():
   
   <img width="705" height="195" alt="Image" 
src="https://github.com/user-attachments/assets/98135c5d-5066-4228-8fe2-77811d25881e";
 />
   
   When b is already tls_data->block_head, the assignment b->u.portal_next = 
tls_data->block_head becomes b->u.portal_next = b, forming a single-node cycle.
   
   Similarly, in src/butil/iobuf.cpp, release_tls_block_chain():
   
   <img width="699" height="147" alt="Image" 
src="https://github.com/user-attachments/assets/65f10488-428f-409e-ac6f-1846a99b8325";
 />
   
   If the chain being returned contains blocks that overlap with the existing 
TLS head, last_b->portal_next can point back to first_b (which may be last_b 
itself), again forming an infinite cycle.
   
   How the Double-Return Happens
   IOBufAsZeroCopyOutputStream::BackUp() calls 
iobuf::release_tls_block(_cur_block) to eagerly return the block to TLS so 
other code can reuse it:
   
   <img width="705" height="110" alt="Image" 
src="https://github.com/user-attachments/assets/0e1a9f13-113c-4942-9865-b5be8c06e63b";
 />
   
   After BackUp(), the block is now tls_data.block_head. If a subsequent 
operation (e.g., _release_block() during destruction of 
IOBufAsZeroCopyOutputStream, or a BackUp in IOBufAsSnappySink) calls 
release_tls_block() again with the same block pointer (obtained from a 
still-live BlockRef), the block is returned a second time — triggering the 
self-loop.
   
   Impact
   - Thread hangs permanently in remove_tls_block_chain() (called at thread 
exit via thread_atexit), or in share_tls_block() / release_tls_block_chain() 
during normal I/O.
   - The hang is silent — no crash, no log, no error — making it extremely 
difficult to diagnose in production.
   - Any brpc application using protobuf serialization over IOBuf (which 
internally uses IOBufAsZeroCopyOutputStream) is potentially affected.
   
   
   
   **To Reproduce**
   
   
   **Expected behavior**
   
   
   **Versions**
   OS:
   Compiler:
   brpc:
   protobuf:
   
   **Additional context/screenshots**
   
   ** Suggested Fix **
   1. Guard release_tls_block() against double-return
   
   <img width="709" height="466" alt="Image" 
src="https://github.com/user-attachments/assets/5d8b7da1-1ff2-44ae-b63e-2283fcc3038b";
 />
   
   2. Guard release_tls_block_chain() against self-loop after linking
   
   <img width="683" height="279" alt="Image" 
src="https://github.com/user-attachments/assets/75b1f6bc-36e1-474f-af61-84ae76636994";
 />
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to