gavinchou opened a new pull request, #64038:
URL: https://github.com/apache/doris/pull/64038

   ### What problem does this PR solve?
   
   Add observability for S3 local rate limiting and S3 429 responses so 
throttling-related performance issues can be diagnosed and alerted more 
directly.
   
   ### What is changed?
   
   - Add explicit S3 local rate limiter bvars for sleep duration, sleep count, 
and rejected count.
   - Rename the old ambiguous S3 local rate limiter bvars so their meaning is 
clear.
   - Log the first and every N local S3 rate limiter throttled/rejected 
requests without reading bvar values for decisions.
   - Add S3 429 retry/failure bvars for S3/Azure object clients.
   - Update S3 rate limiter unit coverage for sleep and rejected metrics.
   
   ### New or renamed bvar metrics
   
   Renamed local S3 GET limiter metrics:
   
   - `get_rate_limit_ns` -> `s3_get_rate_limit_sleep_ns`
   - `get_rate_limit_exceed_req_num` -> `s3_get_rate_limit_sleep_count`
   
   Renamed local S3 PUT limiter metrics:
   
   - `put_rate_limit_ns` -> `s3_put_rate_limit_sleep_ns`
   - `put_rate_limit_exceed_req_num` -> `s3_put_rate_limit_sleep_count`
   
   New local S3 limiter rejection metrics:
   
   - `s3_get_rate_limit_rejected_count`
   - `s3_put_rate_limit_rejected_count`
   
   New S3 429 metrics:
   
   - `s3_request_retry_too_many_requests_count`
   - `s3_request_failed_too_many_requests_count`
   
   Metric examples:
   
   ```text
   s3_get_rate_limit_sleep_ns : 123456789
   s3_get_rate_limit_sleep_count : 42
   s3_get_rate_limit_rejected_count : 3
   s3_put_rate_limit_sleep_ns : 987654321
   s3_put_rate_limit_sleep_count : 27
   s3_put_rate_limit_rejected_count : 1
   s3_request_retry_too_many_requests_count : 8
   s3_request_failed_too_many_requests_count : 2
   ```
   
   Example local limiter logs:
   
   ```text
   S3 GET request is throttled by local rate limiter, sleep_ms=12, 
sleep_count=1000, token_per_second=1000, bucket_tokens=2000, token_limit=5000
   S3 PUT request is rejected by local rate limiter, rejected_count=1, 
token_per_second=1000, bucket_tokens=2000, token_limit=5000
   ```
   
   ### Tests
   
   - `sh run-cloud-ut.sh --run --filter=s3_rate_limiter_test:*`
   - `sh run-be-ut.sh --run --filter=S3FileWriterTest.*`
   
   Note: an earlier BE run used an incorrect lowercase filter 
(`s3_file_writer_test:*`), which matched no suite and started running the full 
BE suite; it was stopped and replaced by the correct `S3FileWriterTest.*` run 
above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to