jasonk000 opened a new issue #11279:
URL: https://github.com/apache/druid/issues/11279


   When configured with a `minHttpTimeoutMilis`, coordinator node has 
HttpPostEmitter thread pegged at 100% CPU when large batches arrive followed by 
an idle period. Alternatively, when no min timeout is set, submissions to 
backend fail due to very short timeout periods (2ms).
   
   ### Affected Version
   
   0.20.0
   
   ### Description
   
   HttpPostEmitter has some heuristic logic to batch events and ship them in an 
efficient and timely fashion, while also trying to somewhat throttle itself.
   
   When a large batch of data is received in too short a time frame, the 
heuristics arrive to a state such that the emitter gets stuck in a busy-loop 
and unable to make forward progress. This is the case until enough normal 
messages have accrued to reset the `lastBatchFillTimeMillis` duration.
   
   I can detect this as follows:
   - Configuration set as follows 
   ```
   # Emit metrics over http
   druid.emitter=http
   druid.emitter.http.recipientBaseUrl=http://localhost:8078/druidmetrics
   druid.emitter.http.minHttpTimeoutMillis=100
   ```
   - Debug logs on, and look for `Failed to send events to 
url[http://localhost:8078/druidmetrics] with timeout less than minimum`
   - CPU up to 100%
   - Monitor logs of downstream service (ie: the receiver of the HTTP POSTs). 
One post arrives and then no more for some period of time.
   - Take heap dump to inspect service configuration - many buffers present in 
failed buffer queue.
   - Heap analysis also shows `lastBatchFillTimeMillis` = 2, and 
`approximateXxxToEmitCount` = 0.
   
   ### Current logic
   
   With reference to the logic 
[here](https://github.com/apache/druid/blob/8296123d895db7d06bc4517db5e767afb7862b83/core/src/main/java/org/apache/druid/java/util/emitter/core/HttpPostEmitter.java#L501-L521)
 and 
[here](https://github.com/apache/druid/blob/8296123d895db7d06bc4517db5e767afb7862b83/core/src/main/java/org/apache/druid/java/util/emitter/core/HttpPostEmitter.java#L708-L716):
   - As data arrives, it is batched. Once a batch is filled, a timer is 
captured `lastBatchFillTimeMillis` to use later, and the batch submitted to the 
endpoint.
   - The overall flow is a `while(true) { readFromQueueAndSubmitBatch(); }` 
flow, with some ability to park failed messages for resubmission.
   - In all cases, the control flows through `sendWithRetries()` into `send()`.
   - In `send()`, through to 
`[computeTimeoutForSendRequestInMillis]`(https://github.com/apache/druid/blob/8296123d895db7d06bc4517db5e767afb7862b83/core/src/main/java/org/apache/druid/java/util/emitter/core/HttpPostEmitter.java#L805-L810),
 a calculation is made to set a timer based on how long the last batch took to 
fill and how much is waiting in queue.
   - `send()` then compares that to the configuration item 
`minHttpTimeoutMillis` and if it is lower, than it stalls submission.
   
   ### Issue
   
   - When a large burst of data arrives into the emitter, that fills more than 
a single buffer, the `lastBatchFillTimeMillis` becomes very short, (say, 2ms). 
The burst stops, and so no more messages are arriving to completely fill a 
batch for "some time" (maybe 10s of seconds).
   - As long as the queue depth is modest, (< 25 in this case), the calculated 
timeout will _always be < 100ms_ (lastBatchFillTime * allowance), and so _the 
send operation will always be rejected_ to wait for later. The allowance factor 
is always `2.0f` or less as the queue grows, which gives a best case of 2ms * 
2.0f = 4ms which is drastically less than the minimum 100ms.
   - The message is immediately failed, and eventually accumulates in the 
failed queue.
   - Once a "normal throughput" batch completes, the last batch fill time will 
be more normal (15818ms in captured analysis).
   - On next iteration, the failed submission queue will attempt a send, and a 
calculated timeout will be greater than the minimum 100ms. Submission proceeds. 
Very quickly all failed batches are submitted to the backend and CPU drops back 
to normal (~0% for HttpPostEmitter).
   
   ### Attempts to work around
   - Setting the `.minHttpTimeoutMillis` from 1-4ms results in a high rate of 
failure under normal operations.
   - Setting `.maxBatchSize` smaller does help reduce it, however the lowest 
limit is strictly defined 
[here](https://github.com/apache/druid/blob/8296123d895db7d06bc4517db5e767afb7862b83/core/src/main/java/org/apache/druid/java/util/emitter/core/HttpPostEmitter.java#L157-L164)
  `maxBatchSize must be greater than MAX_EVENT_SIZE[1,047,552] + overhead`
   - Setting `.flushCount` to a lower number (eg: `50` instead of default 
`500`), means the busy loop is resolved a little quicker (30-60sec) but it is 
more likely that a buffer filling event occurs.
   - Clearing the `minHttpTimeoutMillis` (default of `0`), shows a different 
error under the same circumstances, fundamentally due to the same causes - 
where the timeout heuristic calculated is based on the previous batch fill 
timer but that batch fill was short) - `Request timeout .. after 2ms`:
   ```
   @4000000060a6f7350c20a31c Caused by: java.util.concurrent.TimeoutException: 
Request timeout to localhost/127.0.0.1:8078 after 2 ms
   @4000000060a6f7350c20a31c    at 
org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
 ~[async-http-client-2.5.3.jar:?]
   @4000000060a6f7350c20aed4    at 
org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
 ~[async-http-client-2.5.3.jar:?]
   @4000000060a6f7350c20aed4    at 
io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
 ~[netty-common-4.1.48.Final.jar:4.1.48.Final]
   @4000000060a6f7350c20b2bc    at 
io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
 ~[netty-common-4.1.48.Final.jar:4.1.48.Final]
   @4000000060a6f7350c20b2bc    at 
io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472) 
~[netty-common-4.1.48.Final.jar:4.1.48.Final]
   @4000000060a6f7350c20b2bc    at java.lang.Thread.run(Thread.java:748) 
~[?:1.8.0_282]```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to