jugomezv opened a new issue, #10210:
URL: https://github.com/apache/pinot/issues/10210

   With [PR9994](https://github.com/apache/pinot/pull/9994) and 
[PR10121](https://github.com/apache/pinot/pull/10121) we have introduced two 
new metrics to report last-hop (realtimeIngestionDelayMs) and end-to-end 
(endtToEndIngestionDelayMs) ingestion delay in Pinot. After a few weeks of 
monitoring the values of such metrics in our own deployments and some feedback 
from the OSS community we have noticed that the following edge cases that can 
result in non-intuitive measures:
   
   **Scenario 1: All messages in a batch are filtered**
   messagesAndOffsets.getMessageCount() = 0 and 
messagesAndOffsets.getUnfilteredMessageCount() > 0
   In this case we currently report the ingestion delay for the last event 
consumed correctly by Pinot on the previous batch. If we have continuous 
streams of all events filtered, this measure may not reflect the actual 
situation. We could also think of setting the delay to zero in this case but 
that will not be correct in the situation where we have a steady stream of all 
filtered events arriving as the queue of filtered events may be large. On the 
other hand, it may be correct to report a ramping up delay in this situation 
because Pinot is not actually consuming for a while in this case due to 
filtering.
   
   **Scenario 2: All messages in a batch cause exceptions.**
   In this case we currently report the ingestion delay for the last  
not-filtered message ingested correctly by Pinot aged by the time elapsed since 
the ingestion time. This seems correct by the semantic we intend for our 
metric: if we have a number of failures in decoding messages we should not 
count them as correctly ingested by Pinot and the increasing time should be a 
good signal for users to check other metrics and deal with the errors. 
   
   **Scenario 3: All transformed rows are empty.** 
   If we receive events correctly but reusedResult.getTransformedRows() 
   Is empty for all events in a batch, we will report the ingestion delay for 
the last, not-filtered message ingested correctly, aged by the time elapsed 
since the ingestion time. An alternative here would be to report the delay for 
the last message processed regardless of the function get transformed row 
returning empty.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to