fapifta commented on PR #4278: URL: https://github.com/apache/ozone/pull/4278#issuecomment-1439184020
@adoroszlai it seems that this problem occurs with the DNs during the test in the phase when the cluster is downgraded after an upgrade. The DataNodeIDYaml class reads the DataNodeDetails from the DataNode's yaml file, and it runs into a port that is not recognized by the downgraded version. This though should mean that the change in HDDS-5480 (#2452) is also backward incompatible, but it seems that the port information so far is not saved in the yaml file, as even if the ports are set in the DataNodeDetails, the DataNodeDetails is not persisted to the yaml file after the ports are set. Three things seems to persist the DataNodeDetails: 1. InitDatanodeState 2. SetNodeOperationalStateCommandHandler 3. and the certificate client when a new certificate is persisted to the DN I am unsure about the initialization phases of DN, but it seems that in a cluster 1. runs before the ports are set, 2. does run only when a node is decommissioned or offlined, and after that when it is recommissioned, while 3. runs also before the ports are set. In this change though the HTTP and HTTPS ports are set before the InitDatanode state runs, and saved, and that is why we run into the problem. However this way there are a set of events that cause backward incompatibility with all the ports that were added after V0_PORTS (upgrade -> decomm/offline a node -> recomm the node -> downgrade). I think we might need to solve this similarly as it was solved for the client, but now we need to introduce the handling of unknown ports in the DatanodeIdYaml load logic instead of the client. As this was not discovered earlier the incompatibility remains for the REPLICATION, RATIS_ADMIN, RATIS_SERVER ports in the mentioned scenario, while we can handle it for the RATIS_DATASTREAM port as it was introduced after 1.3.0. @adoroszlai if you agree with my analisys, then I will set up the corresponding JIRAs to fix and handle these, but would not like to run ahead, and do it without consensus on how to solve the problem, and until that @debiswal we should keep this one open, as we need to find out how we want to handle the compatibility aspect of adding these new ports. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
