shauryachats opened a new pull request, #18433:
URL: https://github.com/apache/pinot/pull/18433
## Summary
Multi-stream realtime tables encode Pinot partition IDs as `streamIndex *
10000 + streamPartitionId`. Before this fix, RealtimeSegmentAssignment and
ReplicaGroupSegmentAssignmentStrategy used the raw (encoded) partition ID
directly when computing instance slots, causing incorrect slot mapping and
breaking colocation of segments belonging to the same partition.
### Changes:
- In `RealtimeSegmentAssignment.assignConsumingSegment`, decode the Pinot
partition ID to the stream-level partition ID via
`IngestionConfigUtils.getStreamPartitionIdFromPinotPartitionId` before
computing the instance index.
- In `ReplicaGroupSegmentAssignmentStrategy`, extract a
`getPartitionIdFromSegmentName` helper that applies the same decoding for
REALTIME tables before `% numPartitions`, fixing both single-segment assignment
and rebalance paths.
## Testing
Deployed this on an internal cluster containing a multi-topic table and
verified by setting the `instanceAssignmentConfig` as:
```
"instanceAssignmentConfigMap": {
"CONSUMING": {
"tagPoolConfig": {
"tag": "cluster_REALTIME",
"poolBased": false,
"numPools": 0
},
"replicaGroupPartitionConfig": {
"replicaGroupBased": true,
"numInstances": 0,
"numReplicaGroups": 2,
"numInstancesPerReplicaGroup": 3,
"numPartitions": 3,
"numInstancesPerPartition": 1,
"minimizeDataMovement": true,
"partitionColumn": "trace_id"
},
"partitionSelector": "INSTANCE_REPLICA_GROUP_PARTITION_SELECTOR",
"minimizeDataMovement": false
},
"COMPLETED": {
"tagPoolConfig": {
"tag": "cluster_REALTIME",
"poolBased": false,
"numPools": 0
},
"replicaGroupPartitionConfig": {
"replicaGroupBased": true,
"numInstances": 0,
"numReplicaGroups": 2,
"numInstancesPerReplicaGroup": 3,
"numPartitions": 3,
"numInstancesPerPartition": 1,
"minimizeDataMovement": true,
"partitionColumn": "trace_id"
},
"partitionSelector": "INSTANCE_REPLICA_GROUP_PARTITION_SELECTOR",
"minimizeDataMovement": false
}
},
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]