sodonnel commented on PR #3836:
URL: https://github.com/apache/ozone/pull/3836#issuecomment-1307839545
```
Rack 1: 1,2
Rack 2: 3,4
Rack 3: 1
Rack 4: 1
```
In this example the over rep handler should remove 2 "1" replicas, from
rack1 and then either 3 or 4, leaving:
```
Rack 1: 2
Rack 2: 3,4
Rack 3: 1
Rack 4:
```
However you are correct, in that making an extra copy of "1" doesn't do much
good.
I think we need to take a step back here. What does it mean to be
mis-replicated?
It means that the replicas are not spread across enough racks. If there are
less racks on the cluster than the replicaNum, then it is also fine for there
to be 2 replicas per rack for example.
Assuming there are plenty of racks, for a container to be mis-replicated
when all the replicas are present, there must be some racks hosting more than 1
replica. You can further extend this, and say, there must be some racks hosting
more than 1 replica, where the replica is not also on another rack.
```
R1: 1, 2
R2: 1, 2
R3: 3
R4: 4
R5: 5
```
Above is not mis-replicated, it is simply over-replicated.
A significant complication in the solution to this problem is that a
container can be both over-replicated and mis-replicated at the same time. If
we remove the over-replication part, then the problem becomes simpler, as can
then move any random replica from a rack with more than 1 index.
One idea that is worth thinking about, what if we changed the
ECReplicationCheckHandler, to return health states in this order:
underReplicated
overReplicated
misReplicated
If a container is both over and mis replicated, rather than its state being
mis-replicated (actually under-replicated due to mis-replication), it will
return as over-replicated. Once the over-replication gets fixed, it will be
re-processed and come out as mis-replicated.
Of course, fixing the mis-replication will cause it to go over replicated
again, but I feel this over + mis-replicated will be a relatively rare
occurrence in practice.
Alternatively, I wonder if the algorithm like this will work even with
over-replication:
```
for each rack_group
if replicas_on_rack > 1
for each replica_on_rack
if (another_copy_of_replica_exists)
continue // do nothing as we don't need to copy it
else
copy_to_new_rack
end if
end
end
```
I think this would handle these scenarios:
```
R1: 1, 2
R2: 2, 3
R3: 1
R4: 4
R5: 5
R1: 1, 2, 4
R2: 2, 3
R3: 1
R4:
R5: 5
R1: 1, 1, 2
R2: 2, 3
R3:
R4: 4
R5: 5
```
If the above works, then we just need 2 maps:
replica_index -> count_of_occurrences
rack -> List<ContainerReplica>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]