arunkumarucet opened a new pull request, #18823:
URL: https://github.com/apache/pinot/pull/18823

   ## Summary
   
   - Adds a new `TABLE_TENANT_INFO` controller gauge emitted by 
`SegmentStatusChecker` that encodes the server tenant name as a key segment in 
the JMX metric name: 
`pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1`
   - Adds a dedicated JMX exporter rule in `controller.yml` that extracts 
`table`, `tableType`, `tenant`, and `database` as Prometheus labels from this 
metric
   - Enables tenant-scoped aggregation of any existing table-level metric in 
Prometheus via a `group_left(tenant)` join — no changes to broker/server metric 
pipelines required
   
   ## Motivation
   
   Previously there was no way to aggregate table-scoped metrics (e.g. 
`numDocsScanned`, segment counts) by tenant in Prometheus/Grafana without 
scattered, disruptive changes to add a `tenant` tag throughout the metrics 
pipeline. This approach exposes the table→tenant mapping as a standalone info 
metric that Prometheus can join against:
   
   ```promql
   sum by (tenant) (
     sum by (table) (pinot_server_numDocsScanned_OneMinuteRate{...})
     * on(table) group_left(tenant)
     pinot_controller_tableTenantInfo
   )
   ```
   
   ## Implementation
   
   **Emission strategy:**
   - The gauge is written only once per `(table, tenant)` pair — on first 
registration or when the tenant changes. It is **not** re-emitted on every 
5-minute `SegmentStatusChecker` cycle (early-return when tenant is unchanged).
   - `_tableTenantMap` tracks the current tenant per table so stale gauges are 
removed on: tenant change, null table config, and table removal 
(`nonLeaderCleanup`).
   - The new gauge is registered **before** removing the old tenant's gauge on 
a tenant change, to avoid a scrape-window gap.
   
   **JMX metric name:**
   ```
   "org.apache.pinot.common.metrics":type="ControllerMetrics",
     name="pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant>"
   ```
   
   **Prometheus output (via JMX exporter):**
   ```
   pinot_controller_tableTenantInfo_Value{table="airlineStats", 
tableType="OFFLINE", tenant="DefaultTenant"} 1
   ```
   
   ## Test plan
   
   - [ ] `SegmentStatusCheckerTest#tableTenantInfoGaugeNamedTenantTest` — named 
server tenant is registered
   - [ ] 
`SegmentStatusCheckerTest#tableTenantInfoGaugeDefaultTenantFallbackTest` — 
falls back to `DefaultTenant` when no tenant configured
   - [ ] 
`SegmentStatusCheckerTest#tableTenantInfoGaugeTenantChangeCleansStaleGaugeTest` 
— stale gauge removed when tenant changes
   - [ ] 
`SegmentStatusCheckerTest#tableTenantInfoGaugeTableRemovedCleansUpTest` — gauge 
cleaned up via `nonLeaderCleanup`
   - [ ] `SegmentStatusCheckerTest#tableTenantInfoGaugeRealtimeTableTest` — 
REALTIME table type covered
   - [ ] Verified locally via batch quickstart: 10 MBeans registered, all 
value=1, JMX exporter regex validated against no-database, with-database, and 
REALTIME patterns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to