swamirishi commented on PR #6919:
URL: https://github.com/apache/ozone/pull/6919#issuecomment-2224525910

   > Hi everyone, sorry for being late to the party, but let me add try to add 
to the context of this change.
   > 
   > The RequestFeatureValidator annotation is designed to mark a method that 
validates or rewrites a request/response, and either does corrective actions on 
the request/response to ensure compatibility, or returns a failure response in 
case the client can not handle the returned value again in case the client is 
incompatible with the server.
   > 
   > The basic idea behind this is this:
   > 
   > * if an old client sends a request, we may change the request with some 
desired values to conform with the server's new requirements and process the 
request afterwards
   > * if an old client would get a response that it will not be able to 
process correctly, then we can fail early by sending back a failure response, 
or we can rewrite the response so that the old client can process it properly
   > * if a newer client sends a request, it can make it conform with an old 
server on its own.
   > 
   > Just a sidenote: There were some additional thoughts put into cases where 
the a request is initiating a request to an other server and incorporates the 
result to its own results, like when the flow is client->OM->SCM->OM->client 
e.g. block locations in a getFileInfo call, but nothing else was implemented 
outside OM's request handling from the generic framework. There are some 
additional checks added to .*ClientSideTranslatorPB classes or within SCM's 
protocol servers but those were/should have been shortcuts only until this 
framework is done, but we could not get back to finish it so far.
   > 
   > Now that said, during the implementation we came up with an expensive 
initialization of a structure where we have a 3 layered map structure in 
ValidatorRegistry, within which we can look up the validators that needs to be 
applied to the given validation condition, then the given request type, then 
the request processing phase.
   > 
   > Validation condition in this sense is there to separate two cases one is 
important from the perspective of upgrades, the other one is important from the 
perspective of compatibility. New features should be disabled in case the 
cluster is not finalized yet, in this case what we usually want to do, is to 
provide a meaningful error message and reject the request, but there might be 
scenarios where we just want to fall back to the old behaviour until the 
cluster is finalized, and for that we rewrite parts of the requests (there is 
no such use case yet afaik). New features should not be present in responses to 
old clients, and in some cases they need to change requests from old clients in 
order to be processed properly with that ensuring compatibility with old 
clients.
   > 
   > Now let's say we want to process several hundred thousands of requests per 
second, and we want to hook this check into the very beginning of request 
processing or at the very end for response rewrites, we needed a really quick 
way to figure out what to apply. This is why we have the 3 layer map, from 
which we can get the actual list of request feature processors that apply to 
the given request and nothing else, and then we can call these or skip the 
whole check if there isn't any applicable processors. The lookup first 
considers if we are in a pre-finalized state or if we are talking to an older 
client, if none of this is true, then skips the whole lookup. Then as a next 
step the lookup queries the appropriate map based on if we are in a 
pre-finalized state or if an older client sent the request, within that the 
layer 2 map defined for the request type, and within that the list of 
processors for the actual processing phase pre or post.
   > 
   > Note that pre or post processors have a different role and a different 
signature, preprocessors can rewrite a request, post processors can rewrite a 
response. Also note that in a pre-finalized state we do not return the 
processors for the case when an older client sends the request, and we can do 
this because in the pre-finalized state the new features are disabled.
   > 
   > Ok, now as I have set the floor, let's look at the problems we have. The 
biggest problem is that we have added new client versions without thinking 
about what validators have to be added, so with that 
[HDDS-10983](https://issues.apache.org/jira/browse/HDDS-10983) added a custom 
check in the request processing code for the newly added version. This is most 
likely on me, as the request feature validator framework should be tribal 
knowledge, and we even miss a very generic documentation for it. The next 
problem we have is that we were not mindful enough when we added the first 
validations, and there isn't any checks for the actual version of the client in 
the current validators, or in the way how we determine which validations to 
run. And finally we also do not finished this framework and postponed a good 
couple of things, instead we applied shortcuts here and there to solve the 
handling of the changes regarding bucket layouts due to time constraints.
   > 
   > Let's see now the current patch that tries to address the most burning 
problem, namely that we broke functionality by just adding a new client 
version. Given the original intent of the OLDER_CLIENT_REQUESTS 
ValidationCondition, that covers the whole problem space, now with this change, 
we do not cover the case when we have a server that expects 
ClientVersion.EC_REPLICA_INDEX_REQUIRED_IN_BLOCK_REQUEST and a client that has 
ClientVersion.BUCKET_LAYOUT_SUPPORT so even if we write a method and annotate 
it with the RequestFeatureValidator annotation that will not be found and 
applied with the code in the PR. The question may arise why this is not a 
problem with a server expecting ClientVersion.BUCKET_LAYOUT_SUPPORT and a 
client which has ClientVersion.ERASURE_CODING_SUPPORT or 
ClientVersion.VERSION_HANDLES_UNKNOWN_DN_PORTS. This is not a problem in 
general as the 1.2 Ozone release did not have ClientVersion at all, and we 
introduced ClientVersion in 1.3, where we already had the vers
 ion up at BUCKET_LAYOUT_SUPPORT.
   > 
   > With all that said, I agree with @errose28 that this patch is not the one 
we are looking for. It goes against the overall idea and desing of the 
framework, and also it introduces an other problem.
   > 
   > What possible solutions we have? One way is to add a version check to all 
the methods that are annotated with the RequestFeatureValidator annotation, and 
we ensure that these are applied only to the case where the ClientVersion is 
not before the CURRENT_VERSION, but before the BUCKET_LAYOUT_SUPPORT. An other 
way is to introduce a 4th layer in the multi layered map that we use to get the 
applicable validators, and introduce a new mapping based on the maximum version 
for which the validators is applicable, and then when the ValidatorRegistry 
provides the list of applicable validators, it has to concatenate the list of 
provided validators for all the versions that are in between the 
CURRENT_VERSION, and the version the client has.
   > 
   > I tend to believe we should go down the second route where we extend the 
framework to handle the versions properly, but that seems to be a bit more 
involved compared to put if statements to the existing validator methods for 
now, and defer the generic solution when we pick up and finish the work on the 
request feature validation framework. So I am fine with both ways, I am also 
open to other ideas, but I think it is clear why @errose28 and me are 
refraining to approve changing the ValidationCondition itself.
   
   @fapifta I have raised a patch for the same 
https://github.com/apache/ozone/pull/6932
   
   > 
   > It is too late for me to re-read what I wrote down so far, I hope it is 
understandable, in case of something is not understandable, please do not 
hesitate to ask, and apologies for any mistakes that makes understanding and 
reading this comment, also sorry for the long comment I felt that it is 
important that we are on the same page regarding the intents and internals of 
the whole framework.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to