Herbert Wang created FLINK-39617:
------------------------------------

             Summary:   Add batch REST endpoints for aggregated subtask metrics 
across multiple job vertices
                 Key: FLINK-39617
                 URL: https://issues.apache.org/jira/browse/FLINK-39617
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Metrics, Runtime / REST
            Reporter: Herbert Wang


The JobManager REST API currently exposes aggregated subtask metrics per job 
vertex via:

{code}
GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
{code}

Clients that need the same metric set for many vertices, such as autoscalers or 
monitoring integrations, must issue one request per vertex for metric-name 
discovery and another request per vertex for metric values. For jobs with many 
vertices this creates avoidable REST fan-out, repeated MetricFetcher updates, 
and large repeated payloads.

h2. Proposal

Add two batch JobManager REST endpoints for aggregated subtask metrics across 
multiple vertices:

{code}
POST /jobs/:jobid/vertices/subtasks/metrics/names
POST /jobs/:jobid/vertices/subtasks/metrics/values
{code}

The existing single-vertex endpoint should remain unchanged for compatibility.

The endpoints are intentionally split rather than using one POST endpoint with 
mode-switching behavior, so OpenAPI schemas, code generation, and capability 
detection remain straightforward.

h3. Name discovery endpoint

Request:
{code:json}
{
  "vertexIds": ["<jobVertexId>", "<jobVertexId>"],
  "regex": [".*busyTime.*", ".*numRecords.*"]
}
{code}

Response:
{code:json}
[
  {
    "vertexId": "<jobVertexId>",
    "metrics": [{ "id": "busyTimeMsPerSecond" }]
  }
]
{code}

h3. Value aggregation endpoint

Request:
{code:json}
{
  "vertices": [
    { "vertexId": "<jobVertexId>", "metrics": ["busyTimeMsPerSecond"] },
    { "vertexId": "<jobVertexId>", "metrics": ["numRecordsInPerSecond"] }
  ],
  "agg": ["min", "max", "avg"]
}
{code}

Response:
{code:json}
[
  {
    "vertexId": "<jobVertexId>",
    "metrics": [{ "id": "busyTimeMsPerSecond", "min": 0.0, "max": 1.0, "avg": 
0.5 }]
  }
]
{code}

h2. Compatibility

This is additive. The existing endpoint remains unchanged:

{code}
GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
{code}

Clients can feature-detect the new endpoints and fall back to the existing 
per-vertex endpoint when unavailable, or we can cherry-pick to earlier 2.x 
versions.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to