swamirishi commented on PR #6919: URL: https://github.com/apache/ozone/pull/6919#issuecomment-2224525910
> Hi everyone, sorry for being late to the party, but let me add try to add to the context of this change. > > The RequestFeatureValidator annotation is designed to mark a method that validates or rewrites a request/response, and either does corrective actions on the request/response to ensure compatibility, or returns a failure response in case the client can not handle the returned value again in case the client is incompatible with the server. > > The basic idea behind this is this: > > * if an old client sends a request, we may change the request with some desired values to conform with the server's new requirements and process the request afterwards > * if an old client would get a response that it will not be able to process correctly, then we can fail early by sending back a failure response, or we can rewrite the response so that the old client can process it properly > * if a newer client sends a request, it can make it conform with an old server on its own. > > Just a sidenote: There were some additional thoughts put into cases where the a request is initiating a request to an other server and incorporates the result to its own results, like when the flow is client->OM->SCM->OM->client e.g. block locations in a getFileInfo call, but nothing else was implemented outside OM's request handling from the generic framework. There are some additional checks added to .*ClientSideTranslatorPB classes or within SCM's protocol servers but those were/should have been shortcuts only until this framework is done, but we could not get back to finish it so far. > > Now that said, during the implementation we came up with an expensive initialization of a structure where we have a 3 layered map structure in ValidatorRegistry, within which we can look up the validators that needs to be applied to the given validation condition, then the given request type, then the request processing phase. > > Validation condition in this sense is there to separate two cases one is important from the perspective of upgrades, the other one is important from the perspective of compatibility. New features should be disabled in case the cluster is not finalized yet, in this case what we usually want to do, is to provide a meaningful error message and reject the request, but there might be scenarios where we just want to fall back to the old behaviour until the cluster is finalized, and for that we rewrite parts of the requests (there is no such use case yet afaik). New features should not be present in responses to old clients, and in some cases they need to change requests from old clients in order to be processed properly with that ensuring compatibility with old clients. > > Now let's say we want to process several hundred thousands of requests per second, and we want to hook this check into the very beginning of request processing or at the very end for response rewrites, we needed a really quick way to figure out what to apply. This is why we have the 3 layer map, from which we can get the actual list of request feature processors that apply to the given request and nothing else, and then we can call these or skip the whole check if there isn't any applicable processors. The lookup first considers if we are in a pre-finalized state or if we are talking to an older client, if none of this is true, then skips the whole lookup. Then as a next step the lookup queries the appropriate map based on if we are in a pre-finalized state or if an older client sent the request, within that the layer 2 map defined for the request type, and within that the list of processors for the actual processing phase pre or post. > > Note that pre or post processors have a different role and a different signature, preprocessors can rewrite a request, post processors can rewrite a response. Also note that in a pre-finalized state we do not return the processors for the case when an older client sends the request, and we can do this because in the pre-finalized state the new features are disabled. > > Ok, now as I have set the floor, let's look at the problems we have. The biggest problem is that we have added new client versions without thinking about what validators have to be added, so with that [HDDS-10983](https://issues.apache.org/jira/browse/HDDS-10983) added a custom check in the request processing code for the newly added version. This is most likely on me, as the request feature validator framework should be tribal knowledge, and we even miss a very generic documentation for it. The next problem we have is that we were not mindful enough when we added the first validations, and there isn't any checks for the actual version of the client in the current validators, or in the way how we determine which validations to run. And finally we also do not finished this framework and postponed a good couple of things, instead we applied shortcuts here and there to solve the handling of the changes regarding bucket layouts due to time constraints. > > Let's see now the current patch that tries to address the most burning problem, namely that we broke functionality by just adding a new client version. Given the original intent of the OLDER_CLIENT_REQUESTS ValidationCondition, that covers the whole problem space, now with this change, we do not cover the case when we have a server that expects ClientVersion.EC_REPLICA_INDEX_REQUIRED_IN_BLOCK_REQUEST and a client that has ClientVersion.BUCKET_LAYOUT_SUPPORT so even if we write a method and annotate it with the RequestFeatureValidator annotation that will not be found and applied with the code in the PR. The question may arise why this is not a problem with a server expecting ClientVersion.BUCKET_LAYOUT_SUPPORT and a client which has ClientVersion.ERASURE_CODING_SUPPORT or ClientVersion.VERSION_HANDLES_UNKNOWN_DN_PORTS. This is not a problem in general as the 1.2 Ozone release did not have ClientVersion at all, and we introduced ClientVersion in 1.3, where we already had the vers ion up at BUCKET_LAYOUT_SUPPORT. > > With all that said, I agree with @errose28 that this patch is not the one we are looking for. It goes against the overall idea and desing of the framework, and also it introduces an other problem. > > What possible solutions we have? One way is to add a version check to all the methods that are annotated with the RequestFeatureValidator annotation, and we ensure that these are applied only to the case where the ClientVersion is not before the CURRENT_VERSION, but before the BUCKET_LAYOUT_SUPPORT. An other way is to introduce a 4th layer in the multi layered map that we use to get the applicable validators, and introduce a new mapping based on the maximum version for which the validators is applicable, and then when the ValidatorRegistry provides the list of applicable validators, it has to concatenate the list of provided validators for all the versions that are in between the CURRENT_VERSION, and the version the client has. > > I tend to believe we should go down the second route where we extend the framework to handle the versions properly, but that seems to be a bit more involved compared to put if statements to the existing validator methods for now, and defer the generic solution when we pick up and finish the work on the request feature validation framework. So I am fine with both ways, I am also open to other ideas, but I think it is clear why @errose28 and me are refraining to approve changing the ValidationCondition itself. @fapifta I have raised a patch for the same https://github.com/apache/ozone/pull/6932 > > It is too late for me to re-read what I wrote down so far, I hope it is understandable, in case of something is not understandable, please do not hesitate to ask, and apologies for any mistakes that makes understanding and reading this comment, also sorry for the long comment I felt that it is important that we are on the same page regarding the intents and internals of the whole framework. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
