sodonnel commented on PR #5293:
URL: https://github.com/apache/ozone/pull/5293#issuecomment-1719733975

   > If currently 2 replica present then send delete to 2 DNs. Entry remains in 
memory([transactionToDNsCommitMap](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L282))
 as it requires 3 response.
   
   But isn't that the bug? We have sent 2 commands out, but are waiting for 3 
responses. If you send 2, you should expect 2 back, not 3.
   
   This is the code:
   
   ```
   final Set<ContainerReplica> replicas =
                 containerManager.getContainerReplicas(containerId);
             // The delete entry can be safely removed from the log if all the
             // corresponding nodes commit the txn. It is required to check that
             // the nodes returned in the pipeline match the replication factor.
      
            // Here - replicas.size() is the CM replica state NOW.
            // But dnsWithCommittedTxx, assuming all sent reply, is effectively 
replicas.size() in the past
            // So if the past was 2 as it was under replicated and now its 3, 
you are stuck. Although I think
            // you are saying that the command would be resent to the new 3rd 
and then you will have 3 responses?
            
             if (min(replicas.size(), dnsWithCommittedTxn.size()) 
                 >= container.getReplicationConfig().getRequiredNodes()) {
               List<UUID> containerDns = replicas.stream()
                   .map(ContainerReplica::getDatanodeDetails)
                   .map(DatanodeDetails::getUuid)
                   .collect(Collectors.toList());
               if (dnsWithCommittedTxn.containsAll(containerDns)) {
                 transactionToDNsCommitMap.remove(txID);
                 transactionToRetryCountMap.remove(txID);
                 if (LOG.isDebugEnabled()) {
                   LOG.debug("Purging txId={} from block deletion log", txID);
                 }
                 txIDsToBeDeleted.add(txID);
               }
             }
   ```
   
   I am just not sure if this change will fix this in all cases, and I wonder 
if we would be better ensuring we get the specific DN responses we care about, 
rather than always expecting 
`container.getReplicationConfig().getRequiredNodes()` if we didn't send out 
that many.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to