Copilot commented on code in PR #63194:
URL: https://github.com/apache/doris/pull/63194#discussion_r3231235926
##########
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_metrics.groovy:
##########
@@ -200,12 +200,21 @@ suite("test_streaming_mysql_job_metrics",
metricCount++
}
+ def perJobLag = result.find {
+ it.tags?.metric ==
"doris_fe_streaming_job_per_job_lag" &&
+ it.tags?.job_name == "${jobName}"
+ }
+ if (perJobLag != null) {
+ log.info("per-job lag: ${perJobLag}".toString())
+ metricCount++
+ }
+
}
}
- // 9 streaming_job_* counters + 1 doris_fe_job RUNNING gauge + 5
per-job metrics
- if (metricCount >= 15) {
+ // 9 streaming_job_* counters + 1 doris_fe_job RUNNING gauge + 6
per-job metrics
+ if (metricCount >= 16) {
break
}
Review Comment:
The loop exit condition now depends on finding the lag metric (metricCount
>= 16), but for JDBC CDC lag is N/A during snapshot phase
(SourceOffsetProvider#getLag returns empty). If the implementation follows the
PR description (don’t register lag when empty), this test can become flaky/time
out waiting for the job to reach incremental phase. Consider making the lag
metric check value-aware (only count it when present and non-negative) and/or
not including it in the minimum required count until it becomes applicable.
##########
fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java:
##########
@@ -1328,6 +1330,18 @@ public Long getValue() {
failedTaskCount.addLabel(new MetricLabel("job_id", jobId))
.addLabel(new MetricLabel("job_name", jobName));
DORIS_METRIC_REGISTER.addMetrics(failedTaskCount);
+
+ GaugeMetric<Long> lag = new GaugeMetric<Long>(
+ STREAMING_JOB_PER_JOB_LAG, MetricUnit.SECONDS,
+ "per job lag in seconds of streaming job, -1 means
N/A") {
+ @Override
+ public Long getValue() {
+ return sJob.getLagSeconds();
+ }
+ };
+ lag.addLabel(new MetricLabel("job_id", jobId))
+ .addLabel(new MetricLabel("job_name", jobName));
+ DORIS_METRIC_REGISTER.addMetrics(lag);
Review Comment:
PR description says lag metric should not be registered when lag is not
applicable (e.g., S3/snapshot), but this code always registers the per-job lag
gauge for every StreamingInsertJob and uses -1 as a sentinel. That still
creates Prometheus series for N/A jobs (with a negative value), which is both
misleading and contradicts the stated behavior. Consider skipping
DORIS_METRIC_REGISTER.addMetrics(lag) when sJob.getLag() is empty /
sJob.getLagSeconds() < 0, or update the PR description and metric help text to
match the intended semantics.
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java:
##########
@@ -871,6 +871,24 @@ public void onReplayCreate() throws JobException {
super.onReplayCreate();
}
+ public String getLag() {
+ return offsetProvider != null ? offsetProvider.getLag() : "";
+ }
+
+ // Numeric lag for metrics. Returns -1 when lag is not applicable (S3,
snapshot phase)
+ // or unparseable, so dashboards can filter N/A jobs via lag >= 0.
+ public long getLagSeconds() {
+ String lagStr = getLag();
+ if (lagStr == null || lagStr.isEmpty()) {
+ return -1L;
+ }
+ try {
+ return Long.parseLong(lagStr);
+ } catch (NumberFormatException e) {
+ return -1L;
Review Comment:
getLagSeconds() introduces a sentinel (-1) to represent 'not
applicable/unparseable'. When used for Prometheus export, this forces a numeric
series even when SourceOffsetProvider#getLag is intentionally empty for N/A
cases (snapshot/S3). If the goal is to avoid exporting misleading lag series,
consider returning an Optional/nullable value (or otherwise exposing a way to
distinguish N/A without a sentinel) so MetricRepo can omit registration when
lag is not applicable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]