[jira] [Updated] (HDDS-7462) EC: Fix Reconstruction Issue with StaleRecoveringContainerScrubbingService

Swaminathan Balachandran (Jira) Sun, 06 Nov 2022 17:47:06 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Swaminathan Balachandran updated HDDS-7462:
-------------------------------------------
    Description: 
EC Reconstruction(with Write Chunk Operation) recreates open with replica Index 
0 when StaleRecoveringContainerScrubbingService deletes the recovering 
container. Thus an invalid container with replica 0 is created. This could 
potentially cause SCM failure when container is reported with heartbeat & also 
partial reconstructed container when a new block is written simultaneously with 
recovering container being deleted.

Marking the recovering container as unhealthy should fix the issue. Handling 
the failure to delete unhealthy container should fix the issue from 
Reconstruction Coordinater will cleanup the stale container. 

  was:EC Reconstruction(with Write Chunk Operation) recreates open with replica 
Index 0 when StaleRecoveringContainerScrubbingService deletes the recovering 
container. Thus an invalid container with replica 0 is created. This could 
potentially cause SCM failure when container is reported with heartbeat & also 
partial reconstructed container when a new block is written simultaneously with 
recovering container being deleted.


> EC: Fix Reconstruction Issue with StaleRecoveringContainerScrubbingService
> --------------------------------------------------------------------------
>
>                 Key: HDDS-7462
>                 URL: https://issues.apache.org/jira/browse/HDDS-7462
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Swaminathan Balachandran
>            Assignee: Swaminathan Balachandran
>            Priority: Critical
>
> EC Reconstruction(with Write Chunk Operation) recreates open with replica 
> Index 0 when StaleRecoveringContainerScrubbingService deletes the recovering 
> container. Thus an invalid container with replica 0 is created. This could 
> potentially cause SCM failure when container is reported with heartbeat & 
> also partial reconstructed container when a new block is written 
> simultaneously with recovering container being deleted.
> Marking the recovering container as unhealthy should fix the issue. Handling 
> the failure to delete unhealthy container should fix the issue from 
> Reconstruction Coordinater will cleanup the stale container. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-7462) EC: Fix Reconstruction Issue with StaleRecoveringContainerScrubbingService

Reply via email to