[jira] [Assigned] (HDDS-8471) ReplicationManager: Avoid re-queuing duplicate message for under / over replicated containers

Attila Doroszlai (Jira) Thu, 27 Apr 2023 03:30:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Attila Doroszlai reassigned HDDS-8471:
--------------------------------------

    Assignee: Attila Doroszlai

> ReplicationManager: Avoid re-queuing duplicate message for under / over 
> replicated containers
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDDS-8471
>                 URL: https://issues.apache.org/jira/browse/HDDS-8471
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Attila Doroszlai
>            Priority: Major
>
> The under and over replication queues in ReplicationManager are created when 
> replicationManager checks the health of all containers in the system. When it 
> does that, it forms a new "ReplicationQueue" object wrapping the under and 
> over replicated queues.
> The OverReplicatedProcessor and UnderReplicatedProcessor both extend 
> UnhealthyReplicationProcessor. Within it, it dequeues messages and processes 
> them. If there is an exception, it saves the message in a list, ready to 
> enqueue it again later. It saves the message, rather than enqueuing it 
> immediately, to avoid the queue entering an infinite loop when a container 
> fails repeatedly.
> The issue is that while the Under / Over process is running, it could be 
> saving up containers to requeue, but then ReplicationManager could process 
> all the containers and replace the queue. Then the bad containers are 
> requeued onto the "new" queue, possibly creating duplicates.
> While the duplicates should not cause any problem, it would be better if this 
> was handled more gracefully.
> For example, if the queue has been replaced, drop the failed containers - but 
> how to check if the queue has been replaced?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (HDDS-8471) ReplicationManager: Avoid re-queuing duplicate message for under / over replicated containers

Reply via email to