The idea I am trying right now is:
1. Add waitTimeMS in FetchResponse.
2. If the fetch has to wait in purgatory due to either
replica.fetch.wait.max.ms or fetch.min.bytes, then it will fill the
waitTimeMS in FetchResponse.
3. In updateRequestMetrics() function, we will special-process the Fetch
response, and remove the waitTimeMS out of RemoteTime and TotalTime.
Let me know for any suggestion/feedback.  I like to propose a KIP on that
change.


On Sat, Apr 24, 2021 at 6:09 PM Israel Ekpo <israele...@gmail.com> wrote:

> Hi Ming
>
> This would be a useful metric from a monitoring perspective especially
> when troubleshooting or diagnosing issues.
>
> Are you looking to modify the Admin API for this capability to be added?
> The metrics for quorum controllers, brokers, replicas and consumers may
> need to be reported differently
>
> I am interested in this capability as well.
>
> Maybe there is something in the current Admin API that is not obvious yet
> so I will need to investigate first and will get back to you with my
> thoughts/suggestions.
>
> Thanks for bringing this up
>
> Cheers
>
>
>
> On Sat, Apr 24, 2021 at 1:21 PM Ming Liu <minga...@gmail.com> wrote:
>
>> Hi All,
>>      I am thinking about to start a KIP to report "REAL" broker/consumer
>> fetch latency. Before that, I like to collect any idea or suggestions.  I
>> created https://issues.apache.org/jira/browse/KAFKA-12713.
>>      The fetch latency is an important metric to monitor for the cluster
>> performance. With ACK=ALL, the produce latency is affected primarily by
>> broker fetch latency.  However, currently the reported fetch latency
>> didn't
>> reflect the true fetch latency because it sometimes needs to stay in
>> purgatory and wait for replica.fetch.wait.max.ms when data is not
>> available. This greatly affects the real P50, P99 etc.
>>
>> I like to propose a KIP to be able track the real fetch latency for both
>> broker follower and consumer.
>>
>> Ming
>>
>

Reply via email to