Hi all, Rate limiting was introduced to Polaris in September 2024 and has seen several updates since. Recently, a bug was identified regarding how we handle certain logic paths, and a fix has been proposed in a new PR.
In reviewing the recent PR and the underlying implementation, I’ve been thinking about a few architectural questions: 1. Correctness and Determinism: Does the current logic fully capture the deterministic properties required by the Token Bucket algorithm? For instance, by not accounting for fractional token regeneration, are we at risk of drifting from expected behavior? 2. Testing Strategy: Would it be beneficial to introduce deterministic clock simulations or property-based verification to our test suite? I’m concerned that our current reliance on the wall clock might make it difficult to catch edge cases in a CI environment. 3. Clock Sources: Since the wall clock can move backward and forward, should we consider moving toward a strictly monotonic clock to avoid arithmetic overflows or "infinite token" scenarios? 4. Leveraging Existing Libraries: Given the complexity of getting concurrency and synchronization right for high-throughput rate limiting, would it make sense to explore using Guava’s RateLimiter? Since it's already on our classpath, could we benefit from its maturity, optimized synchronization and sub-millisecond precision? I'm also curious to hear your thoughts on whether we should address these additional areas: * Cold Start: Should we be supporting "warm-up" periods for rate limiters? * Contention: Could we reduce synchronization overhead by leveraging a more mature implementation's critical paths? Rather than continuing to patch the custom implementation, what do you all think about pausing the development on TokenBucket and evaluating a migration to a more established library like Guava? Robert
