[jira] [Commented] (HDDS-6721) Ozone key deletion often dose not delete the real data blocks in datanode

Ethan Rose (Jira) Mon, 16 May 2022 10:22:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537675#comment-17537675
 ]


Ethan Rose commented on HDDS-6721:
----------------------------------

After some offline discussion, there are two reasons the blocks are not getting 
deleted:
1. The blocks were present in open containers, which do not process block 
deletions until closure.
2. Block deletions existed in the SCM Ratis log, but had not been flushed to 
the database on a snapshot (every 1000 transactions) so the SCM's deletion 
scanner was not picking them up and sending the back to the datanodes. This was 
worked around by restarting the SCM.

We would like to fix issue 2, since SCM may not quickly make progress to 1000 
transactions in a stable cluster that is mostly serving reads. One option is to 
introduce a time based Ratis snapshot. cc [~szetszwo] for input on if we could 
add this on the Ratis side.

> Ozone key deletion often dose not delete the real data blocks in datanode
> -------------------------------------------------------------------------
>
>                 Key: HDDS-6721
>                 URL: https://issues.apache.org/jira/browse/HDDS-6721
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Shawn
>            Priority: Major
>
> When I delete a key of ozone (by using either ozone cli or aws cli), it seems 
> ozone does not clean up the real data blocks in datanodes. The key is gone at 
> the meta data layer (OM), but not at datanode layer.
> I did the following investigation. 
> OM layer: it seems OM functions correctly. I do see the log saying it deletes 
> the block. Also The rocksDB table deletedTable is empty.
> SCM layer: it does receive the deletion command from OM. And it tries to 
> delete the block. But it seems it only add the transaction in memory, and 
> dose not record the transaction in the rocksDB, and thus it does not call 
> datanode to do the deletion. 
> * No log likes "Totally added x blocks to be deleted for y datanodes" in SCM 
> leader node. 
> * The deletedBlocks rocksDB table does not have the info of the containerID 
> being deleted. 
> * And in audit log, it has the log "2022-05-09 18:18:46,553 | INFO  | 
> SCMAudit | user=om/[email protected] | ip=100.116.76.9 | 
> op=DELETE_KEY_BLOCK 
> {KeyBlockToDelete=BlockGroup[groupID='/s3v/shawn-test2/tmp/pvc.yaml', 
> blockIDs=[conID: 9001 locID: 109611004723209001 bcsId: 0]]} | ret=SUCCESS |"
> Datanode layer: not receive any command from SCM to delete the blocks. The 
> blocks of the deleting container still exists on disk. The rocksDB table 
> delete_txns of that container is empty.
> I also tried to restart the SCM to flush the ratis logs. But it ends up the 
> same as above.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-6721) Ozone key deletion often dose not delete the real data blocks in datanode

Reply via email to