jineshparakh opened a new pull request, #17736:
URL: https://github.com/apache/pinot/pull/17736
Background:
Pinot tracks per-partition primary key counts for realtime tables with
upsert or dedup enabled. These counts are exposed via two server APIs:
- `/tables/{tableName}/metadata` — per-partition PK counts per server
- `/instance/primaryKeyCount` — total PK count across all tables on the
instance
Issue:
`RealtimeTableDataManager.getUpsertPartitionToPrimaryKeyCount()` only
checked `isUpsertEnabled()`. For dedup tables, it always returned an empty map,
even though the dedup metadata manager already tracked PK counts internally.
This caused the metadata API to return an empty PK count map for dedup tables,
and the instance-level API to exclude dedup PKs from the total.
Fix:
- Renamed `getUpsertPartitionToPrimaryKeyCount()` to
`getPartitionToPrimaryKeyCount()` and added a fallback to check
`isDedupEnabled()` and delegate to `_tableDedupMetadataManager`.
- Renamed internal fields/getters from `upsert`-prefixed to generic names
across `TableMetadataInfo`, `ServerSegmentMetadataReader`, `TablesResource`,
and `PrimaryKeyCount`. The `@JsonProperty` annotation is explicitly kept with
the old name to preserve the JSON wire format during rolling upgrades.
Tests:
Manual verification with a cluster running both upsert and dedup tables:
Before fix:
```
> curl -s http://localhost:7500/tables/dedupMeetupRsvp_REALTIME/metadata
{
"tableName" : "dedupMeetupRsvp_REALTIME",
"diskSizeInBytes" : 0,
"numSegments" : 1,
"numRows" : 0,
"columnLengthMap" : { },
"columnCardinalityMap" : { },
"maxNumMultiValuesMap" : { },
"columnIndexSizeMap" : { },
"upsertPartitionToServerPrimaryKeyCountMap" : { }
}
> curl -s http://localhost:7500/instance/primaryKeyCount
{
"instanceId" : "Server_100.112.214.70_7050",
"numPrimaryKeys" : 63,
"upsertAndDedupTables" : [ "upsertPartialMeetupRsvp_REALTIME",
"dedupMeetupRsvp_REALTIME" ],
"lastUpdatedTimeInEpochMs" : 1771584881254
}
```
After fix:
```
> curl -s http://localhost:7500/tables/dedupMeetupRsvp_REALTIME/metadata
{
"tableName" : "dedupMeetupRsvp_REALTIME",
"diskSizeInBytes" : 0,
"numSegments" : 1,
"numRows" : 0,
"columnLengthMap" : { },
"columnCardinalityMap" : { },
"maxNumMultiValuesMap" : { },
"columnIndexSizeMap" : { },
"upsertPartitionToServerPrimaryKeyCountMap" : {
"1" : {
"Server_100.112.214.70_7050" : 35
}
}
}
> curl -s http://localhost:7500/instance/primaryKeyCount
{
"instanceId" : "Server_100.112.214.70_7050",
"numPrimaryKeys" : 89,
"upsertAndDedupTables" : [ "upsertPartialMeetupRsvp_REALTIME",
"dedupMeetupRsvp_REALTIME" ],
"lastUpdatedTimeInEpochMs" : 1771584741720
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]