Re: [DISCUSS] FLIP-XXX Support currentFetchEventTimeLag and processingLag metrics

lorenzo . affetti Wed, 24 Apr 2024 00:02:03 -0700

Hello jialiang tan, and thank you for your contribution!

Here are my questions:

1 - I don't know if exposing this as metric to the user could create some harm. 
There can be major discrepancies between the absolute time got by the 
TaskManagers (if clocks are not synchronized via ntp for example), and the 
results of the metric might be quite distant for different TMs. Furthermore, 
comparing the time on the Flink cluster with the event time of records might 
introduce completely inaccurate results. I think providing this should come 
with many disclaimers to the user. Maybe, more experienced contributors can 
comment on this as well.

2 - I don't think the name `processingLag` represents the processing time 
spent, I would rather see `processingTime` just for the semantics of the name 
itself.

3 - Do you really think the `processingTime` should be a gauge? I understand 
your justification for the fetch lag, but I think the processing time should be 
an histogram. For the inefficiency of this, how about some sampling (e.g.: only 
update the histogram 1 every 1000 events?)

4 - At this point, if we have the processing time and number of records, we 
could also add throughput as a metric, so that the user would know how many 
records/second the source is able to produce.

5 - For the "Kafka Connector" section: can this be generalized for connectors 
in general? Can you provide an example to better understand your statement 
about reflection?

6 - Does this introduce any UI change for representing the metric?

Thank you!
On Apr 22, 2024 at 12:26 +0200, jialiang tan <tanjialiang1...@gmail.com>, wrote:
> Sorry all, it seems bad formatting in my email message, now I send it again
> gently and hope it work.
>
> I would like to start a discussion about FLIP-XXX:
> SupportcurrentFetchEventTimeLag and processingLag metrics [1].
>
> The main motivation for this change was that I had some difficulties
> inimplementing the currentFetchEventTimeLag metrics for KafkaSource [2].
>
> So I proposed to let the SourceReaderMetricGroup provide an interface to
> capturethe FetchTime so that all the FLIP-27 [3] sources can easily
> implement thecurrentFetchEventTimeLag metrics.
>
> In addition, I propose to support the processingLag metric for the
> FLIP-27sources to measure the current processing latency of the source.
>
> See the FLIP [1] and Jira [2] for more details.
>
> Looking forward to your comments and opinions!
>
> Thanks,
> TanJiaLiang.
>
> [1]
> https://docs.google.com/document/d/1nPhh1A-v-a7zyQyl1A5-K5DeUqbfxNXdjr2TVBT-QMs/edit?usp=sharing
> [2] https://issues.apache.org/jira/browse/FLINK-33173
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>
> >

Re: [DISCUSS] FLIP-XXX Support currentFetchEventTimeLag and processingLag metrics

Reply via email to