fapifta commented on PR #4278:
URL: https://github.com/apache/ozone/pull/4278#issuecomment-1439184020

   @adoroszlai 
   it seems that this problem occurs with the DNs during the test in the phase 
when the cluster is downgraded after an upgrade.
   The DataNodeIDYaml class reads the DataNodeDetails from the DataNode's yaml 
file, and it runs into a port that is not recognized by the downgraded version.
   
   This though should mean that the change in HDDS-5480 (#2452) is also 
backward incompatible, but it seems that the port information so far is not 
saved in the yaml file, as even if the ports are set in the DataNodeDetails, 
the DataNodeDetails is not persisted to the yaml file after the ports are set.
   Three things seems to persist the DataNodeDetails:
   1. InitDatanodeState
   2. SetNodeOperationalStateCommandHandler
   3. and the certificate client when a new certificate is persisted to the DN
   
   I am unsure about the initialization phases of DN, but it seems that in a 
cluster 1. runs before the ports are set, 2. does run only when a node is 
decommissioned or offlined, and after that when it is recommissioned, while 3. 
runs also before the ports are set.
   
   In this change though the HTTP and HTTPS ports are set before the 
InitDatanode state runs, and saved, and that is why we run into the problem. 
However this way there are a set of events that cause backward incompatibility 
with all the ports that were added after V0_PORTS (upgrade -> decomm/offline a 
node -> recomm the node -> downgrade).
   
   I think we might need to solve this similarly as it was solved for the 
client, but now we need to introduce the handling of unknown ports in the 
DatanodeIdYaml load logic instead of the client. As this was not discovered 
earlier the incompatibility remains for the REPLICATION, RATIS_ADMIN, 
RATIS_SERVER ports in the mentioned scenario, while we can handle it for the 
RATIS_DATASTREAM port as it was introduced after 1.3.0.
   
   @adoroszlai if you agree with my analisys, then I will set up the 
corresponding JIRAs to fix and handle these, but would not like to run ahead, 
and do it without consensus on how to solve the problem, and until that 
@debiswal we should keep this one open, as we need to find out how we want to 
handle the compatibility aspect of adding these new ports.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to