maytasm commented on PR #17847:
URL: https://github.com/apache/druid/pull/17847#issuecomment-2779705752
@kfaraz
We want to be able to measure and ensure that our e2e streaming ingestion
latency is within some x latency (for SLA). For example, we may have a use case
that we need to make sure data is available within x time after it is produced
to Kafka. Currently, Druid calculates ingest/events/messageGap as Time gap in
milliseconds between the latest ingested event timestamp (during that emission
period) and the current system timestamp of metrics emission. This results in
the **minimum** gap and is not very useful. We cannot measure, track, or ensure
our SLA with how the ingest/events/messageGap is currently being calculated.
Here is an example:
Emission period of 5secs between t0 to t4:
At t0, we have a message arriving with timestamp of t-500
At t1, we have a message arriving with timestamp of t-499
At t2, we have a message arriving with timestamp of t-499
At t3, we have a message arriving with timestamp of t-499
At t4, we have a message arriving with timestamp of t3
The above is an example where we process 5 rows of data. t3 is the latest
ingested event timestamp in this period
When we emits the metric at t4, we would calculate message gap as t4 - t3 =
1 sec gap.
This disregards all the earlier late messages.
For example, in the above, if our SLA to our users is 5 secs, the
ingest/events/messageGap reported as 1 sec gap would seems like we are within
SLA but in fact 80% of our messages are more than 500seconds late!
We want to improve it by:
- Calculate the messageGap for each message individually
- i.e. At t0, we have a message arriving with timestamp of t-500. This
should be record as a 500sec messageGap
- Report either a distribution (not sure if this is possible with Druid
metric system) or a min/max/avg of the messageGaps we saw in an emission period
(min/max/avg would still be useful)
- The above, we saw the messageGap of 500seconds, 500seconds,
501seconds, 502seconds, 1 second. We should report min of 1s, max of 502s, avg
400.8s
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]