HaoTien opened a new pull request, #12765:
URL: https://github.com/apache/apisix/pull/12765
# feat: Add error ratio-based circuit breaking policy to api-breaker plugin
## What this PR does / why we need it
This PR implements error ratio-based circuit breaking (`unhealthy-ratio`
policy) for the `api-breaker` plugin, providing more intelligent and adaptive
circuit breaking behavior based on error rates within a sliding time window,
rather than just consecutive failure counts.
**Closes #[12763]**
## Types of changes
- [x] New feature (non-breaking change which adds functionality)
- [x] Documentation update
## Description
### Current Limitations
- The existing failure count-based approach only considers consecutive
failures
- It doesn't account for the overall error rate in relation to total requests
- May be too sensitive during low traffic periods or not sensitive enough
during high traffic periods
### New Features Added
- **Error ratio-based circuit breaking**: New `unhealthy-ratio` policy that
triggers circuit breaker based on error rate within a sliding time window
- **Configurable parameters**: Support for error ratio threshold, minimum
request threshold, sliding window size, etc.
- **Circuit breaker states**: Proper implementation of CLOSED, OPEN, and
HALF_OPEN states
- **Backward compatibility**: Existing configurations continue to work
without changes
### New Configuration Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `policy` | string | `"unhealthy-count"` | Circuit breaker policy |
| `unhealthy.error_ratio` | number | `0.5` | Error rate threshold (0-1) to
trigger circuit breaker |
| `unhealthy.min_request_threshold` | integer | `10` | Minimum requests
needed before evaluating error rate |
| `unhealthy.sliding_window_size` | integer | `300` | Sliding window size in
seconds for error rate calculation |
| `unhealthy.permitted_number_of_calls_in_half_open_state` | integer | `3` |
Number of permitted calls in half-open state |
| `healthy.success_ratio` | number | `0.6` | Success rate threshold to close
circuit breaker from half-open state |
### Example Configuration
```json
{
"plugins": {
"api-breaker": {
"break_response_code": 503,
"policy": "unhealthy-ratio",
"max_breaker_sec": 60,
"unhealthy": {
"http_statuses": [500, 502, 503, 504],
"error_ratio": 0.5,
"min_request_threshold": 10,
"sliding_window_size": 300,
"permitted_number_of_calls_in_half_open_state": 3
},
"healthy": {
"http_statuses": [200, 201, 202],
"success_ratio": 0.6
}
}
}
}
```
## How Has This Been Tested?
- [x] Schema validation tests for new parameters
- [x] Functional tests for error ratio calculation
- [x] Circuit breaker state transition tests
- [x] Integration tests with various traffic patterns
- [x] Backward compatibility tests
- [x] Performance tests to ensure no regression
### Test Results
```bash
# Run the new test file
prove -I. -r t/plugin/api-breaker2.t
# Verify existing tests still pass
prove -I. -r t/plugin/api-breaker.t
```
## Files Modified
- `apisix/plugins/api-breaker.lua` - Core plugin logic with new ratio-based
policy
- `t/plugin/api-breaker2.t` - New comprehensive test file for ratio-based
circuit breaking
- `docs/en/latest/plugins/api-breaker.md` - Updated English documentation
- `docs/zh/latest/plugins/api-breaker.md` - Updated Chinese documentation
## Checklist
- [x] My code follows the code style of this project
- [x] My change requires a change to the documentation
- [x] I have updated the documentation accordingly
- [x] I have read the **CONTRIBUTING** document
- [x] I have added tests to cover my changes
- [x] All new and existing tests passed
- [x] I have squashed my commits into logical units
- [x] My commit messages are in the proper format
## Additional Notes
This implementation:
- ✅ **Maintains full backward compatibility** - existing configurations work
unchanged
- ✅ **Follows APISIX patterns** - consistent with existing plugin
architecture
- ✅ **Comprehensive testing** - covers all scenarios and edge cases
- ✅ **Performance optimized** - efficient sliding window implementation
- ✅ **Well documented** - updated both English and Chinese docs
The feature addresses real-world use cases for:
- High-traffic services with better error spike handling
- Variable traffic patterns with adaptive behavior
- Microservices architectures requiring precise circuit breaking
- SLA-based circuit breaking with configurable error rates
Ready for review and feedback!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]