devmadhuu commented on PR #10161: URL: https://github.com/apache/ozone/pull/10161#issuecomment-4386043605
> Hi @devmadhuu , do we see container replication eventually failed in real case due to this? I'm not sure if we need this as it will create more same container replicas on single datanode cases, and is current duplicate replica deletion logic ready to handle this? If we are sure the existing container directory is stale, can we just delete it and continue the import? @ChenSammi thanks for your review. This was identified as part of some use case discussed. We don't have good conflict resolution for Ratis or EC when a DN ends up with multiple copies, but we can end up in this situation due to volume failures as well. We need to prioritize moving to a safer state, which means allowing replication to pass. We do not want to end up in a situation where containers are perpetually under-replicated because no target nodes are valid. Since replication failure is not the only way DNs can end up with duplicate replicas, we should allow replication and just pick a different volume. Better duplicate replica handling on startup may require internal reconciliation within the DN but that can be done later. cc: @errose28 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
