J-HowHuang opened a new pull request, #17799:
URL: https://github.com/apache/pinot/pull/17799
## Description
For tables using `FD_AWARE_INSTANCE_PARTITION_SELECTOR` as their partition
selector in instance assignment config, it's likely to fail rebalance when
`minimizeDataMovement=true` if the instances didn't change in all pools.
## Reproduce
Run quickstart offline, for table `airlineStats_OFFLINE`, remove its
`tierConfigs` and add the following `instanceAssignmentConfigMap`
```
"instanceAssignmentConfigMap": {
"OFFLINE": {
"tagPoolConfig": {
"tag": "DefaultTenant_OFFLINE",
"poolBased": false,
"numPools": 1
},
"replicaGroupPartitionConfig": {
"replicaGroupBased": true,
"numInstances": 0,
"numReplicaGroups": 1,
"numInstancesPerReplicaGroup": 1,
"numPartitions": 0,
"numInstancesPerPartition": 0,
"minimizeDataMovement": false
},
"partitionSelector": "FD_AWARE_INSTANCE_PARTITION_SELECTOR",
"minimizeDataMovement": false
}
}
```
Run rebalance with minimize data movement enabled.
Results in
```
{
"jobId": "b6c56b27-450a-49c0-ac69-16835ed47fff",
"status": "FAILED",
"description": "Caught exception while fetching/calculating instance
partitions: java.util.NoSuchElementException"
}
```
Controller log:
```
2026/03/03 12:23:37.194 INFO [InstanceTagPoolSelector]
[grizzly-http-server-2] Selecting 1 instances for table: airlineStats_OFFLINE
2026/03/03 12:23:37.194 INFO [InstanceAssignmentDriver]
[grizzly-http-server-2] No instance constraint is configured, using default
hash-based-rotate instance constraint
2026/03/03 12:23:37.194 INFO [HashBasedRotateInstanceConstraintApplier]
[grizzly-http-server-2] Rotating instances for table: airlineStats_OFFLINE with
hash: 802879867
2026/03/03 12:23:37.194 INFO [FDAwareInstancePartitionSelector]
[grizzly-http-server-2] Assigning 1 replica groups to 1 fault domains
2026/03/03 12:23:37.194 INFO [FDAwareInstancePartitionSelector]
[grizzly-http-server-2] Warning, normalizing isn't finished yet
2026/03/03 12:23:37.195 WARN
[TableRebalancer-airlineStats_OFFLINE-2b912476-fb58-4b47-907a-7db8f87d85c1]
[grizzly-http-server-2] Caught exception while fetching/calculating instance
partitions, aborting the rebalance
java.util.NoSuchElementException
at java.base/java.util.TreeMap.key(TreeMap.java:1602)
at java.base/java.util.TreeMap.firstKey(TreeMap.java:291)
at
org.apache.pinot.controller.helix.core.assignment.instance.FDAwareInstancePartitionSelector$CandidateQueue.<init>(FDAwareInstancePartitionSelector.java:255)
at
org.apache.pinot.controller.helix.core.assignment.instance.FDAwareInstancePartitionSelector$ReplicaGroupBasedAssignmentState.fill(FDAwareInstancePartitionSelector.java:395)
at
org.apache.pinot.controller.helix.core.assignment.instance.FDAwareInstancePartitionSelector.selectInstances(FDAwareInstancePartitionSelector.java:195)
at
org.apache.pinot.controller.helix.core.assignment.instance.InstanceAssignmentDriver.getInstancePartitions(InstanceAssignmentDriver.java:154)
at
org.apache.pinot.controller.helix.core.assignment.instance.InstanceAssignmentDriver.getInstancePartitions(InstanceAssignmentDriver.java:126)
at
org.apache.pinot.controller.helix.core.assignment.instance.InstanceAssignmentDriver.assignInstances(InstanceAssignmentDriver.java:69)
at
org.apache.pinot.controller.helix.core.rebalance.TableRebalancer.getInstancePartitions(TableRebalancer.java:1274)
```
## Change
Add a check before
`FDAwareInstancePartitionSelector$ReplicaGroupBasedAssignmentState.fill` to see
if there's any instance to fill, otherwise skip.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]