[ 
https://issues.apache.org/jira/browse/HDDS-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644063#comment-17644063
 ] 

István Fajth commented on HDDS-7454:
------------------------------------

As we discussed, the problem is that currently a rogue client can write blocks 
to DataNodes that are different from the Pipeline information that is provided 
for the client from Ozone Manager. This is true in secure and non-secure 
environments.As Neil mentioned this might compromise a container when SCM 
checks the replicas and figures out which are the over replicated container and 
if there are excess replicas which ones to delete, as if a rogue client writes 
a container to 3 nodes (even via STANDALONE replication type) and properly sync 
these writes bcsid associated with the container might go above the one in the 
good containers, and with that take over the precedence and make the old valid 
data to be removed potentially.

As this can happen in a non secure environment, I strongly believe we should 
not touch the tokens as that does not solves the problem at all, as tokens are 
present only in a secured environment.

I think the solution is within SCM, as if a DN does not have the container yet 
(it does not have a valid replica of the container), then at container creation 
an ICR is being triggered, and while that ICR is processed, that container 
should be marked as an invalid replica and SCM should issue a delete container 
to the DataNode reported the invalid container. (We should be able to determine 
that the container is invalid during ICR processing, as SCM should know which 
container belongs to which Pipeline and if the DN is not part of the Pipeline 
it should not report creation of a container with the specific container ID.)If 
possible Ozone Manager also should refuse the write and metadata update, based 
on information provided by SCM (either by caching the in flight write Pipelines 
and then the Pipelines reported by the client at the end of the write, or by 
directly checking the write location with SCM to validate the write).

We should not include this information in the tokens I believe, as we don't 
gain anything with that, after implementing proper measures to deal with such 
rouge clients. Here is why: if the SCM instructs the DN within 2 heartbeats to 
remove the rogue container, then rogue clients will have 2HB of time (1 min by 
default if no container creation happens in between the 2 HB, but it happens... 
so less than 1 min) to occupy space from the cluster with garbage data, but in 
order to do that they need access permission the first time, and if they have 
access permissions, they can write garbage anyway to valid locations, so the 
only thing we need to prevent is messing up the container space and the OM 
metadata, and that is done with the proposed check in ICR and with the check at 
committing the write from the client to OM.

> OM to DN token verification should include Pipeline
> ---------------------------------------------------
>
>                 Key: HDDS-7454
>                 URL: https://issues.apache.org/jira/browse/HDDS-7454
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sumit Agrawal
>            Assignee: Sumit Agrawal
>            Priority: Minor
>              Labels: pull-request-available
>
> Client will request for block information to be used to write data, In this 
> process,
> - OM call allocateBlock to SCM, SCM will provide block information, pipeline 
> and related DN
> - OM also create token (when security enabled) with block information
> - Client will pass this information to DN
> - DN will verify token for block information and start write block
> Here, pipeline information is not verified for which request is created. As 
> security, this also needs to be verified.
> Pipeline and DN mapping is shared to DN which Pipeline command from SCM to 
> DNs, CreatePipelineCommand
> Impact (If client is not trustable):
> 1. Client can forward request with token to different DN with different 
> pipeline information.
> So DN since do not have information about SMC mapping of container to 
> pipeline, that DN can start operating over that.
> Having pipeline in token verification, it will ensure,
> - block write is done with correct pipeline (DNs)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to