[ 
https://issues.apache.org/jira/browse/HDDS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384986#comment-17384986
 ] 

Ethan Rose edited comment on HDDS-5432 at 7/21/21, 4:39 PM:
------------------------------------------------------------

Before SCM HA, datanode volume directories were in the format 
<volume>/hdds/<scm-id>. After SCM HA, there could be multiple SCMs with 
different IDs, so the directory structure on datanodes was changed to 
<volume>/hdds/<cluster-id> in HDDS-4866. This Jira reformats datanode volumes 
from the old format to the new format by renaming the SCM ID directory, and 
rewriting all container files within the directory because they contain 
absolute paths to chunk and metadata. This is done on startup of the datanode 
with the new bits, so it is a backwards incompatible change for downgrades. The 
upgrade framework was intended to handle this by using the original cluster ID 
format in pre-finalize, and when the datanodes finalize, they would perform 
these rewrite steps and start using cluster ID instead of SCM ID. However, 
after some offline discussion, we realized that since the upgrade framework 
does not provide protection against reads on datanodes during upgrade (only 
write protection), we cannot safely rewrite volumes and containers like this 
during pre-finalize when the datanode is already running and talking to SCM. It 
can only be safely done on startup like in HDDS-4866, but this will not allow 
us to support downgrades to 1.1.0.

One proposed solution was when given a volume in the old format 
<volume>/hdds/<scm-id>, create a symlink <volume>/hdds/<cluster-id> pointing to 
<volume>/hdds/<scm-id>. This would allow 1.2.0 to read via the cluster ID path, 
and after downgrade, 1.1.0 could still read through the SCM ID path. Container 
files would not need to be rewritten since their original SCM ID paths would 
still be valid. The issue with this approach is that 1.1.0 requires that all 
<volume>/hdds directories contain only one subdirectory, so placing this 
symlink here in 1.2.0 will still fail downgrades for 1.1.0. All other solutions 
discussed involve rewriting the container files while the datanode is running 
during finalize, which is not safe given the current upgrade framework 
implementation as discussed above.

In light of this, it seems the best way forward will be to use the symlink 
approach, but require the cluster ID symlink to be deleted between datanode 
downgrades so that 1.1.0 will only see one subdirectory of <volume>/hdds. The 
upgrade framework has no downgrade hook. The pre-finalized new version is 
stopped, and the old version started. Therefore this symlink removal step will 
have to be done manually for every datanode. We will provide a datanode helper 
command that will need to be run on every datanode before starting 1.1.0 back 
up after downgrading from 1.2.0. This command will go to every volume on every 
datanode and delete the cluster ID symlink. This approach will be tested using 
the upgrade acceptance tests.


was (Author: erose):
Before SCM HA, datanode volume directories were in the format 
<volume>/hdds/<scm-id>. After SCM HA, there could be multiple SCMs with 
different IDs, so the directory structure on datanodes was changed to 
<volume>/hdds/<cluster-id> in HDDS-4866. This Jira reformats datanode volumes 
from the old format to the new format by renaming the SCM ID directory, and 
rewriting all container files within the directory because they contain 
absolute paths to chunk and metadata. This is done on startup of the datanode 
with the new bits, so it is a backwards incompatible change for downgrades. The 
upgrade framework was intended to handle this by using the original cluster ID 
format in pre-finalize, and when the datanodes finalize, they would perform 
these rewrite steps and start using cluster ID instead of SCM ID. However, 
after some offline discussion, we realized that since the upgrade framework 
does not provide protection against reads on datanodes during upgrade (only 
write protection), we cannot safely rewrite volumes and containers like this 
during pre-finalize when the datanode is already running and talking to SCM. It 
can only be safely done on startup like in HDDS-4886, but this will not allow 
us to support downgrades to 1.1.0.

One proposed solution was when given a volume in the old format 
<volume>/hdds/<scm-id>, create a symlink <volume>/hdds/<cluster-id> pointing to 
<volume>/hdds/<scm-id>. This would allow 1.2.0 to read via the cluster ID path, 
and after downgrade, 1.1.0 could still read through the SCM ID path. Container 
files would not need to be rewritten since their original SCM ID paths would 
still be valid. The issue with this approach is that 1.1.0 requires that all 
<volume>/hdds directories contain only one subdirectory, so placing this 
symlink here in 1.2.0 will still fail downgrades for 1.1.0. All other solutions 
discussed involve rewriting the container files while the datanode is running 
during finalize, which is not safe given the current upgrade framework 
implementation as discussed above.

In light of this, it seems the best way forward will be to use the symlink 
approach, but require the cluster ID symlink to be deleted between datanode 
downgrades so that 1.1.0 will only see one subdirectory of <volume>/hdds. The 
upgrade framework has no downgrade hook. The pre-finalized new version is 
stopped, and the old version started. Therefore this symlink removal step will 
have to be done manually for every datanode. We will provide a datanode helper 
command that will need to be run on every datanode before starting 1.1.0 back 
up after downgrading from 1.2.0. This command will go to every volume on every 
datanode and delete the cluster ID symlink. This approach will be tested using 
the upgrade acceptance tests.

> Enable downgrade testing after 1.1.0 release
> --------------------------------------------
>
>                 Key: HDDS-5432
>                 URL: https://issues.apache.org/jira/browse/HDDS-5432
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Ethan Rose
>            Priority: Blocker
>
> Now that Ozone 1.1.0 has been released, we can use its docker image to test 
> downgrades from this version of Ozone (1.2) to 1.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to