suvodeep-pyne opened a new pull request, #17239:
URL: https://github.com/apache/pinot/pull/17239
## Summary
Introduces `PinotThrottledLogger` utility to provide rate-limited exception
logging for record transformers, solving the "Noisy Neighbor" problem where
high-frequency errors consume all log quota and starve low-frequency critical
errors.
### Key Changes
- **New `PinotThrottledLogger` utility**
(`pinot-common/src/main/java/org/apache/pinot/common/utils/PinotThrottledLogger.java`):
- Implements per-exception-class rate limiting using Guava's RateLimiter
- Each exception type gets independent rate limiter (prevents noisy
neighbor problem)
- Tracks and reports suppression counts when rate limit lifts
- Thread-safe implementation using ConcurrentHashMap and AtomicLong
- Falls back to DEBUG logging when rate=0 (backward compatible behavior)
- **New metric**: `ServerMeter.LOGS_DROPPED_BY_THROTTLED_LOGGER` for
observability
- **Updated 6 transformers** to use throttled logger:
- `FilterTransformer`
- `DataTypeTransformer`
- `ExpressionTransformer`
- `ComplexTypeTransformer`
- `SchemaConformingTransformer`
- `TimeValidationTransformer`
- **New configuration**:
`IngestionConfig.ingestionExceptionLogRateLimitPerMin` (default: 0)
### Backward Compatibility
- Default rate limit is 0, which disables throttling and falls back to DEBUG
logging (original behavior)
- No behavior change for existing deployments unless explicitly configured
### Benefits
1. **Solves Noisy Neighbor Problem**: High-frequency `NumberFormatException`
won't starve low-frequency `ConnectException`
2. **Production Visibility**: When rate > 0, exceptions log at WARN/ERROR
level instead of DEBUG
3. **Controlled Log Volume**: Prevents log flooding during data quality
issues
4. **Observability**: Per-table metric tracks dropped logs
### Example Output
```
WARN [FilterTransformer] Caught exception while executing filter function...
java.lang.NumberFormatException: For input string: "abc"
[... 4 more similar logs within 1 minute ...]
[After rate limit window passes]
WARN [FilterTransformer] ... Suppressed 9995 occurrences of
NumberFormatException ...
WARN [FilterTransformer] Caught exception while executing filter function...
```
Meanwhile, different exception types (e.g., `ConnectException`) log
immediately using independent rate limiters.
### Testing
- Added comprehensive unit tests covering:
- Independent rate limiting per exception class
- Backward compatibility (null config → DEBUG)
- Explicit zero rate → DEBUG fallback
- Thread safety under concurrent load
All tests pass: 4/4 in `PinotThrottledLoggerTest`
### Implementation Details
- API simplified: Transformers pass `IngestionConfig` directly to logger
- Rate conversion logic (per-min → per-sec) encapsulated in logger
- Metric emission only when table name is provided
- Memory-bounded: Exception class count typically 10-50
## Test plan
- [x] Unit tests pass
- [x] Compilation successful across all modules
- [ ] Manual testing with real ingestion workload
- [ ] Verify metric emission in production-like environment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]