[jira] [Updated] (HDDS-6667) Recon can crash if processing a container report after installing an OM snapshot

Ethan Rose (Jira) Thu, 28 Apr 2022 15:03:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ethan Rose updated HDDS-6667:
-----------------------------
    Attachment: hs_err_pid46101.log.txt

> Recon can crash if processing a container report after installing an OM 
> snapshot
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-6667
>                 URL: https://issues.apache.org/jira/browse/HDDS-6667
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Recon
>    Affects Versions: 1.1.0, 1.2.0, 1.2.1
>            Reporter: Ethan Rose
>            Assignee: Ethan Rose
>            Priority: Major
>         Attachments: hs_err_pid46101.log.txt
>
>
> There are two threads that access Recon's RocksDB instance: One is doing 
> updates based on the OM DB state (ContainerKeyMapperTask), the other is doing 
> updates based on container reports (ReconContainerReportHandler). When 
> ContainerKeyMapperTask is updating from a snapshot, it needs to account for 
> keys that may have been deleted, however the snapshot alone does not provide 
> this information, so it needs to clear out its existing container -> key 
> mappings and rebuild them from scratch. It does this by calling 
> ContainerDBServiceProvider#initNewContainerDB, which deletes the whole recon 
> DB from the disk and creates a new one. This gives us the current problem:
> 1. ContainerKeyMapperTask#reprocess is called to do a snapshot based update 
> from OM.
> 2. ContainerKeyMapperTask deletes and recreates the Recon DB.
> 3. Recon receives and processes a container report. When it needs to update 
> the DB it may be using a stale handle from the old DB, or it may be trying to 
> access the DB between it being deleted and created.
> This scenario caused a RocksDB crash on Recon, shown in the attached crash 
> dump.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-6667) Recon can crash if processing a container report after installing an OM snapshot

Reply via email to