Herbert Wang created FLINK-39617:
------------------------------------
Summary: Add batch REST endpoints for aggregated subtask metrics
across multiple job vertices
Key: FLINK-39617
URL: https://issues.apache.org/jira/browse/FLINK-39617
Project: Flink
Issue Type: Improvement
Components: Runtime / Metrics, Runtime / REST
Reporter: Herbert Wang
The JobManager REST API currently exposes aggregated subtask metrics per job
vertex via:
{code}
GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
{code}
Clients that need the same metric set for many vertices, such as autoscalers or
monitoring integrations, must issue one request per vertex for metric-name
discovery and another request per vertex for metric values. For jobs with many
vertices this creates avoidable REST fan-out, repeated MetricFetcher updates,
and large repeated payloads.
h2. Proposal
Add two batch JobManager REST endpoints for aggregated subtask metrics across
multiple vertices:
{code}
POST /jobs/:jobid/vertices/subtasks/metrics/names
POST /jobs/:jobid/vertices/subtasks/metrics/values
{code}
The existing single-vertex endpoint should remain unchanged for compatibility.
The endpoints are intentionally split rather than using one POST endpoint with
mode-switching behavior, so OpenAPI schemas, code generation, and capability
detection remain straightforward.
h3. Name discovery endpoint
Request:
{code:json}
{
"vertexIds": ["<jobVertexId>", "<jobVertexId>"],
"regex": [".*busyTime.*", ".*numRecords.*"]
}
{code}
Response:
{code:json}
[
{
"vertexId": "<jobVertexId>",
"metrics": [{ "id": "busyTimeMsPerSecond" }]
}
]
{code}
h3. Value aggregation endpoint
Request:
{code:json}
{
"vertices": [
{ "vertexId": "<jobVertexId>", "metrics": ["busyTimeMsPerSecond"] },
{ "vertexId": "<jobVertexId>", "metrics": ["numRecordsInPerSecond"] }
],
"agg": ["min", "max", "avg"]
}
{code}
Response:
{code:json}
[
{
"vertexId": "<jobVertexId>",
"metrics": [{ "id": "busyTimeMsPerSecond", "min": 0.0, "max": 1.0, "avg":
0.5 }]
}
]
{code}
h2. Compatibility
This is additive. The existing endpoint remains unchanged:
{code}
GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
{code}
Clients can feature-detect the new endpoints and fall back to the existing
per-vertex endpoint when unavailable, or we can cherry-pick to earlier 2.x
versions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)