[
https://issues.apache.org/jira/browse/HDDS-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18055304#comment-18055304
]
Ethan Rose commented on HDDS-14417:
-----------------------------------
[~Sammi] I don't think this patch is the correct solution. Blocks can be
allocated by OM for containers that never got created for a variety of reasons,
so it does not fix the issue of orphan entries in SCM for all cases. It also
introduces an orphan data issue because the OM does not know whether or not the
zero length block actually had some data written to it in the Datanode. The
client's committed length can be any number less than what was actually written.
A better solution seems to be just prohibiting the commit of zero length blocks
in OM. Empty keys should have empty block lists so there is no use case for
zero length blocks. We need OM to delete all blocks it allocated because it
does not know whether or not there was any data written for them. If the
container for these blocks never got created it will remain stuck in SCM DB,
but this can happen to blocks of any length if the pipeline is closed between
when the blocks were allocated and written.
> Skip wrapping allocated but unused blocks for empty file as pseudo file and
> save in deletedTable
> ------------------------------------------------------------------------------------------------
>
> Key: HDDS-14417
> URL: https://issues.apache.org/jira/browse/HDDS-14417
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.2.0
>
>
> In the storage capacity feature testing, Prince has found that once he run
> the CLI
> {code:java}
> "ozone freon ockg -n 100 -s 0 -v vol43 -b bucket -p dir1/dir2"
> {code}
> There are pending deletion block transactions reported by SCM
> {code:java}
> {
> "totalBlocksize": 26843545600,
> "totalReplicatedBlockSize": 80530636800,
> "totalBlocksCount": 100
> }
> {code}
> Since the files created by ockg are 0 length files, so it's not expected that
> there will be blocks for deletion.
> Investigation through SCM audit log and log file, shows that SCM does
> received the block deletion requests from OM, as that above data of SCM is
> correct. While there is no related DELETION in om audit log.
> A further investigation shows that this is related with pre-allocated block
> of OM during key/file creation. Here is the flow,
> - OM receives a key1 creation request, created key1, and allocated a new
> block1 for key1
> - OM receives key1's commit request, commit key1, and wrap the unused block1
> as a pseudo file key1-p, and put this key1-p into the deletedTable
> - OM KeyDeletingService service scan the deletedTable, find key1-p, sent
> block deletion request to SCM for key1-p
> Since block1 doesn't used, its container is never created on DNs, no replica
> of this container can be found. If user manually closes this container, then
> this container will never get chance to be created, so the block deletion for
> block1 will stay in SCM DB forever.
> If this container keeps open and SCM allocates another block for this
> container, and data written to DN, this container got created. The block1
> deletion command finally can be executed on DNs, but it will cause block file
> not found error or block metadata not found error.
> So the ideal state is we don't wrap this block allocated but unused as a
> pseudo file and put into deletedTable, as this block doesn’t need delete, it
> doesn’t exist.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]