karthikgurram87 opened a new issue, #15655:
URL: https://github.com/apache/druid/issues/15655

   We have a druid setup which consumes from a kafka topic with approx 400 
partitions.  Since we migrated to druid 25 we have been seeing that 
`ingest/kafka/partitionLag` and `ingest/kafka/maxLag` are getting dropped 
periodically at overlord. We don't see the issue of dropping with other 
metrics. 
   
   Similar drop is happening with `ingest/notices/queueSize` but it's not 
noticed as much because there is no alert on it. 
   
   We are using `StatsDEmitter` to send the metrics and they eventually end up 
in datadog. We ruled out all the sources where the metrics can get dropped. We 
are relying on `dogstatsd.client.packets_dropped` to see if the metrics are 
getting dropped. The telemetry metrics available in the `StatsDProcessor` do 
not have any tags to correctly associate the dropped packets to a specific 
node. Attaching a screenshot 
   <img width="774" alt="Screenshot 2024-01-10 at 3 01 06 PM" 
src="https://github.com/apache/druid/assets/136335354/427a729a-d693-4b28-9f97-d47f340e5217";>
   
   `ingest/kafka/maxLag` is crucial to us as we rely on it extensively for 
alerting. We use `ingest/kafka/partitionLag` to identify the partitions that 
lag the most. 
   
   The metrics are not dropped if we disable the partitionLag metric.  The high 
no of partitions is causing some of the metrics to be dropped in the 
StatsDSender as the out bound queue becomes full frequently. 
   
   **Proposal**
   
   1.  Emit partitionLag in a new `ScheduledExecutorService`  with a 
configurable emissionPeriod in `SeekableStreamSupervisor`. Put a random delay 
in each iteration of the for loop so that the total delay is less than 
emissionPeriod.
   2. Pass on the tags available in `StatsDEmitter` to telemetry metrics. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to