jugomezv opened a new issue, #10210: URL: https://github.com/apache/pinot/issues/10210
With [PR9994](https://github.com/apache/pinot/pull/9994) and [PR10121](https://github.com/apache/pinot/pull/10121) we have introduced two new metrics to report last-hop (realtimeIngestionDelayMs) and end-to-end (endtToEndIngestionDelayMs) ingestion delay in Pinot. After a few weeks of monitoring the values of such metrics in our own deployments and some feedback from the OSS community we have noticed that the following edge cases that can result in non-intuitive measures: **Scenario 1: All messages in a batch are filtered** messagesAndOffsets.getMessageCount() = 0 and messagesAndOffsets.getUnfilteredMessageCount() > 0 In this case we currently report the ingestion delay for the last event consumed correctly by Pinot on the previous batch. If we have continuous streams of all events filtered, this measure may not reflect the actual situation. We could also think of setting the delay to zero in this case but that will not be correct in the situation where we have a steady stream of all filtered events arriving as the queue of filtered events may be large. On the other hand, it may be correct to report a ramping up delay in this situation because Pinot is not actually consuming for a while in this case due to filtering. **Scenario 2: All messages in a batch cause exceptions.** In this case we currently report the ingestion delay for the last not-filtered message ingested correctly by Pinot aged by the time elapsed since the ingestion time. This seems correct by the semantic we intend for our metric: if we have a number of failures in decoding messages we should not count them as correctly ingested by Pinot and the increasing time should be a good signal for users to check other metrics and deal with the errors. **Scenario 3: All transformed rows are empty.** If we receive events correctly but reusedResult.getTransformedRows() Is empty for all events in a batch, we will report the ingestion delay for the last, not-filtered message ingested correctly, aged by the time elapsed since the ingestion time. An alternative here would be to report the delay for the last message processed regardless of the function get transformed row returning empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
