[
https://issues.apache.org/jira/browse/HDDS-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christos Bisias updated HDDS-8173:
----------------------------------
Attachment: rocksDBContainerDelete.diff
> SchemaV3 RocksDB entries are not removed after container delete
> ---------------------------------------------------------------
>
> Key: HDDS-8173
> URL: https://issues.apache.org/jira/browse/HDDS-8173
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Christos Bisias
> Priority: Critical
> Attachments: rocksDBContainerDelete.diff
>
>
> After deleting a container, all RocksDB entries for that container should be
> deleted from RocksDB. Metadata and block data are still on the DB intact.
>
> We can reproduce this issue on a docker cluster as follows:
> * start a docker cluster with 5 datanodes
> * put a key under a bucket to create a container
> * close the container
> * put 2 datanodes that the container has replicas, on decommission
> * recommission the datanodes
> * container should be over-replicated
> * ReplicationManager should issue a container delete for 2 datanodes
> * Check one of the two datanodes
> * Container should be deleted
> * Check RocksDB block data entries for the container
>
> on {color:#00875a}master{color}, ozone root
>
> {code:java}
> ❯ cd hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone {code}
> edit {color:#00875a}docker-config {color}and add below two configs needed for
> decommission
>
>
> {code:java}
> OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm
> OZONE-SITE.XML_ozone.scm.address.scmservice.scm=scm {code}
> start Ozone cluster with 5 datanodes, connect to scm and create a key
>
> {code:java}
> ❯ docker-compose up --scale datanode=5 -d
> ❯ docker exec -it ozone_scm_1 bash
> bash-4.2$ ozone sh volume create /vol1
> bash-4.2$ ozone sh bucket create /vol1/bucket1
> bash-4.2$ ozone sh key put /vol1/bucket1/key1 /etc/hosts{code}
> close the container and check on which datanodes it's on
> {code:java}
> bash-4.2$ ozone admin container close 1
> bash-4.2$ ozone admin container info 1
> ...
> {code}
> check scm roles, to get scm IP and port
>
> {code:java}
> bash-4.2$ ozone admin scm roles
> 99960cfeda73:9894:LEADER:62393063-a1e0-4d5e-bcf5-938cf09a9511:172.25.0.4
> {code}
>
> check datanode list, to get IP and hostname for 2 datanodes the container is
> on
>
> {code:java}
> bash-4.2$ ozone admin datanode list
> ... {code}
> place both datanodes on decommission
>
>
> {code:java}
> bash-4.2$ ozone admin datanode decommission -id=scmservice
> --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname> {code}
> wait until both datanodes are decommissioned, at that point if we check the
> container's info we can see that it has replicas placed upon other datanodes
> as well
>
>
> recommission both datanodes
>
> {code:java}
> bash-4.2$ ozone admin datanode recommission -id=scmservice
> --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname> {code}
> After a few minutes, on scm logs
>
> {code:java}
> 2023-03-15 18:24:53,810 [ReplicationMonitor] INFO
> replication.LegacyReplicationManager: Container #1 is over replicated.
> Expected replica count is 3, but found 5.
> 2023-03-15 18:24:53,810 [ReplicationMonitor] INFO
> replication.LegacyReplicationManager: Sending delete container command for
> container #1 to datanode
> d6461c13-c2fa-4437-94f5-f75010a49069(ozone_datanode_2.ozone_default/172.25.0.11)
> 2023-03-15 18:24:53,811 [ReplicationMonitor] INFO
> replication.LegacyReplicationManager: Sending delete container command for
> container #1 to datanode
> 6b077eea-543b-47ca-abf2-45f26c106903(ozone_datanode_5.ozone_default/172.25.0.6)
> {code}
> connect to one of the datanodes, the container is being deleted
>
> check that the container is deleted
> {code:java}
> bash-4.2$ ls
> /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/current/containerDir0/
> bash-4.2$ {code}
> check RocksDB
> {code:java}
> bash-4.2$ ozone debug ldb --db
> /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/DS-a8a72696-e4cf-42a6-a66c-04f0b614fde4/container.db
> scan --column-family=block_data {code}
> Block data for the deleted container are still there
> {code:java}
> "blockID": {
> "containerBlockID": {
> "containerID": 1,
> "localID": 111677748019200001 {code}
> {color:#00875a}metadata{color} and {color:#00875a}block_data{color} still
> have the entries while {color:#00875a}deleted_blocks{color} and
> {color:#00875a}delete_txns{color} are empty.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]