[
https://issues.apache.org/jira/browse/HDDS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai reassigned HDDS-8471:
--------------------------------------
Assignee: Attila Doroszlai
> ReplicationManager: Avoid re-queuing duplicate message for under / over
> replicated containers
> ---------------------------------------------------------------------------------------------
>
> Key: HDDS-8471
> URL: https://issues.apache.org/jira/browse/HDDS-8471
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Stephen O'Donnell
> Assignee: Attila Doroszlai
> Priority: Major
>
> The under and over replication queues in ReplicationManager are created when
> replicationManager checks the health of all containers in the system. When it
> does that, it forms a new "ReplicationQueue" object wrapping the under and
> over replicated queues.
> The OverReplicatedProcessor and UnderReplicatedProcessor both extend
> UnhealthyReplicationProcessor. Within it, it dequeues messages and processes
> them. If there is an exception, it saves the message in a list, ready to
> enqueue it again later. It saves the message, rather than enqueuing it
> immediately, to avoid the queue entering an infinite loop when a container
> fails repeatedly.
> The issue is that while the Under / Over process is running, it could be
> saving up containers to requeue, but then ReplicationManager could process
> all the containers and replace the queue. Then the bad containers are
> requeued onto the "new" queue, possibly creating duplicates.
> While the duplicates should not cause any problem, it would be better if this
> was handled more gracefully.
> For example, if the queue has been replaced, drop the failed containers - but
> how to check if the queue has been replaced?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]