sodonnel commented on PR #3836:
URL: https://github.com/apache/ozone/pull/3836#issuecomment-1307839545

   ```
   Rack 1: 1,2
   Rack 2: 3,4
   Rack 3: 1
   Rack 4: 1
   ```
   In this example the over rep handler should remove 2 "1" replicas, from 
rack1 and then either 3 or 4, leaving:
   
   ```
   Rack 1: 2
   Rack 2: 3,4
   Rack 3: 1
   Rack 4:
   ```
   
   However you are correct, in that making an extra copy of "1" doesn't do much 
good.
   
   I think we need to take a step back here. What does it mean to be 
mis-replicated?
   
   It means that the replicas are not spread across enough racks. If there are 
less racks on the cluster than the replicaNum, then it is also fine for there 
to be 2 replicas per rack for example.
   
   Assuming there are plenty of racks, for a container to be mis-replicated 
when all the replicas are present, there must be some racks hosting more than 1 
replica. You can further extend this, and say, there must be some racks hosting 
more than 1 replica, where the replica is not also on another rack.
   
   ```
   R1: 1, 2
   R2: 1, 2
   R3: 3
   R4: 4
   R5: 5
   ```
   
   Above is not mis-replicated, it is simply over-replicated.
   
   A significant complication in the solution to this problem is that a 
container can be both over-replicated and mis-replicated at the same time. If 
we remove the over-replication part, then the problem becomes simpler, as can 
then move any random replica from a rack with more than 1 index.
   
   One idea that is worth thinking about, what if we changed the 
ECReplicationCheckHandler, to return health states in this order:
   
   underReplicated
   overReplicated
   misReplicated
   
   If a container is both over and mis replicated, rather than its state being 
mis-replicated (actually under-replicated due to mis-replication), it will 
return as over-replicated. Once the over-replication gets fixed, it will be 
re-processed and come out as mis-replicated.
   
   Of course, fixing the mis-replication will cause it to go over replicated 
again, but I feel this over + mis-replicated will be a relatively rare 
occurrence in practice.
   
   Alternatively, I wonder if the algorithm like this will work even with 
over-replication:
   
   ```
   for each rack_group
     if replicas_on_rack > 1
       for each replica_on_rack
         if (another_copy_of_replica_exists)
           continue // do nothing as we don't need to copy it
         else
           copy_to_new_rack
         end if
       end
   end
   ```
   
   I think this would handle these scenarios:
   
   ```
   R1: 1, 2
   R2: 2, 3
   R3: 1
   R4: 4
   R5: 5
   
   R1: 1, 2, 4
   R2: 2, 3
   R3: 1
   R4:
   R5: 5
   
   R1: 1, 1, 2
   R2: 2, 3
   R3: 
   R4: 4
   R5: 5
   ```
   
   If the above works, then we just need 2 maps:
   
   replica_index -> count_of_occurrences
   rack -> List<ContainerReplica>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to