[
https://issues.apache.org/jira/browse/HDDS-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384986#comment-17384986
]
Ethan Rose edited comment on HDDS-5432 at 7/21/21, 4:39 PM:
------------------------------------------------------------
Before SCM HA, datanode volume directories were in the format
<volume>/hdds/<scm-id>. After SCM HA, there could be multiple SCMs with
different IDs, so the directory structure on datanodes was changed to
<volume>/hdds/<cluster-id> in HDDS-4866. This Jira reformats datanode volumes
from the old format to the new format by renaming the SCM ID directory, and
rewriting all container files within the directory because they contain
absolute paths to chunk and metadata. This is done on startup of the datanode
with the new bits, so it is a backwards incompatible change for downgrades. The
upgrade framework was intended to handle this by using the original cluster ID
format in pre-finalize, and when the datanodes finalize, they would perform
these rewrite steps and start using cluster ID instead of SCM ID. However,
after some offline discussion, we realized that since the upgrade framework
does not provide protection against reads on datanodes during upgrade (only
write protection), we cannot safely rewrite volumes and containers like this
during pre-finalize when the datanode is already running and talking to SCM. It
can only be safely done on startup like in HDDS-4866, but this will not allow
us to support downgrades to 1.1.0.
One proposed solution was when given a volume in the old format
<volume>/hdds/<scm-id>, create a symlink <volume>/hdds/<cluster-id> pointing to
<volume>/hdds/<scm-id>. This would allow 1.2.0 to read via the cluster ID path,
and after downgrade, 1.1.0 could still read through the SCM ID path. Container
files would not need to be rewritten since their original SCM ID paths would
still be valid. The issue with this approach is that 1.1.0 requires that all
<volume>/hdds directories contain only one subdirectory, so placing this
symlink here in 1.2.0 will still fail downgrades for 1.1.0. All other solutions
discussed involve rewriting the container files while the datanode is running
during finalize, which is not safe given the current upgrade framework
implementation as discussed above.
In light of this, it seems the best way forward will be to use the symlink
approach, but require the cluster ID symlink to be deleted between datanode
downgrades so that 1.1.0 will only see one subdirectory of <volume>/hdds. The
upgrade framework has no downgrade hook. The pre-finalized new version is
stopped, and the old version started. Therefore this symlink removal step will
have to be done manually for every datanode. We will provide a datanode helper
command that will need to be run on every datanode before starting 1.1.0 back
up after downgrading from 1.2.0. This command will go to every volume on every
datanode and delete the cluster ID symlink. This approach will be tested using
the upgrade acceptance tests.
was (Author: erose):
Before SCM HA, datanode volume directories were in the format
<volume>/hdds/<scm-id>. After SCM HA, there could be multiple SCMs with
different IDs, so the directory structure on datanodes was changed to
<volume>/hdds/<cluster-id> in HDDS-4866. This Jira reformats datanode volumes
from the old format to the new format by renaming the SCM ID directory, and
rewriting all container files within the directory because they contain
absolute paths to chunk and metadata. This is done on startup of the datanode
with the new bits, so it is a backwards incompatible change for downgrades. The
upgrade framework was intended to handle this by using the original cluster ID
format in pre-finalize, and when the datanodes finalize, they would perform
these rewrite steps and start using cluster ID instead of SCM ID. However,
after some offline discussion, we realized that since the upgrade framework
does not provide protection against reads on datanodes during upgrade (only
write protection), we cannot safely rewrite volumes and containers like this
during pre-finalize when the datanode is already running and talking to SCM. It
can only be safely done on startup like in HDDS-4886, but this will not allow
us to support downgrades to 1.1.0.
One proposed solution was when given a volume in the old format
<volume>/hdds/<scm-id>, create a symlink <volume>/hdds/<cluster-id> pointing to
<volume>/hdds/<scm-id>. This would allow 1.2.0 to read via the cluster ID path,
and after downgrade, 1.1.0 could still read through the SCM ID path. Container
files would not need to be rewritten since their original SCM ID paths would
still be valid. The issue with this approach is that 1.1.0 requires that all
<volume>/hdds directories contain only one subdirectory, so placing this
symlink here in 1.2.0 will still fail downgrades for 1.1.0. All other solutions
discussed involve rewriting the container files while the datanode is running
during finalize, which is not safe given the current upgrade framework
implementation as discussed above.
In light of this, it seems the best way forward will be to use the symlink
approach, but require the cluster ID symlink to be deleted between datanode
downgrades so that 1.1.0 will only see one subdirectory of <volume>/hdds. The
upgrade framework has no downgrade hook. The pre-finalized new version is
stopped, and the old version started. Therefore this symlink removal step will
have to be done manually for every datanode. We will provide a datanode helper
command that will need to be run on every datanode before starting 1.1.0 back
up after downgrading from 1.2.0. This command will go to every volume on every
datanode and delete the cluster ID symlink. This approach will be tested using
the upgrade acceptance tests.
> Enable downgrade testing after 1.1.0 release
> --------------------------------------------
>
> Key: HDDS-5432
> URL: https://issues.apache.org/jira/browse/HDDS-5432
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Ethan Rose
> Priority: Blocker
>
> Now that Ozone 1.1.0 has been released, we can use its docker image to test
> downgrades from this version of Ozone (1.2) to 1.1.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]