hanahmily opened a new issue, #13565:
URL: https://github.com/apache/skywalking/issues/13565

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Apache SkyWalking Component
   
   BanyanDB (apache/skywalking-banyandb)
   
   ### What happened
   
   
   We've encountered an issue where failed parts in the liaison's sending queue 
block other parts from being sent. To address this, we need to implement a 
generalized solution applicable to all data models using the sending queue. The 
proposed solution includes:
   
   1. **Backoff Retry Component**: 
      - Introduce a backoff retry mechanism to handle failed parts. 
      - This component should operate in synchronous mode to ensure proper 
handling of retries.
   
   2. **Handling Persistent Failures**: 
      - In cases where retries ultimately fail, we should move the failed parts 
to an independent folder located under the shard folder. 
   
   3. **Metrics for Monitoring**: 
      - Implement metrics that help pinpoint which parts have failed, allowing 
for easier tracking and debugging.
   
   **Additional Notes:**
   - Leaving failed parts in place is intended for debugging purposes. They 
should only be removed manually to ensure no data is lost unintentionally.
   
   This enhancement will improve the reliability of the sending queue and 
facilitate better troubleshooting of any issues that arise.
   
   
   ### What you expected to happen
   
   Failed parts should be handled gracefully without affecting the processing 
of other parts, with clear metrics for monitoring failures. 
   
   ### How to reproduce
   
   1. Introduce a scenario where parts are sent and fail.
   2. Observe how other parts are blocked from being processed.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit a pull request to fix on your own?
   
   - [ ] Yes I am willing to submit a pull request on my own!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to