errose28 commented on a change in pull request #2056:
URL: https://github.com/apache/ozone/pull/2056#discussion_r608223192
##########
File path:
hadoop-ozone/dist/src/main/compose/upgrade/upgrades/manual-upgrade/test.sh
##########
@@ -20,32 +20,37 @@
set -e -o pipefail
-COMPOSE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-export COMPOSE_DIR
-
-: "${OZONE_REPLICATION_FACTOR:=3}"
-: "${OZONE_UPGRADE_FROM:="0.5.0"}"
-: "${OZONE_UPGRADE_TO:="1.0.0"}"
-: "${OZONE_VOLUME:="${COMPOSE_DIR}/data"}"
-
-export OZONE_REPLICATION_FACTOR OZONE_UPGRADE_FROM OZONE_UPGRADE_TO
OZONE_VOLUME
-
-source "${COMPOSE_DIR}/testlib.sh"
-
-create_data_dir
-
-prepare_for_binary_image "${OZONE_UPGRADE_FROM}"
-load_version_specifics "${OZONE_UPGRADE_FROM}"
-first_run
-unload_version_specifics
-
-from=$(get_logical_version "${OZONE_UPGRADE_FROM}")
-to=$(get_logical_version "${OZONE_UPGRADE_TO}")
-execute_upgrade_steps "$from" "$to"
-
-prepare_for_binary_image "${OZONE_UPGRADE_TO}"
-load_version_specifics "${OZONE_UPGRADE_TO}"
-second_run
-unload_version_specifics
-
-generate_report
+# Fail if required vars are not set.
Review comment:
I see your point. A name like `test.sh` seems to imply that you can run
it directly and it will do the testing just fine, since this is the pattern
with the other acceptance tests. Perhaps renaming to `driver.sh` would be
better, since they are driving the specified upgrade flow? I would like to use
the `run_test_script` functionality here, however. Maybe that function can be
modified to read from a variable specifying the name of the file to use.
I do think the variables should be required to be defined here. I don't
think there are sensible defaults so I would rather fail with a clear error
than do something unintentional if the test is incorrectly set up.
Let me know your thoughts.
##########
File path: hadoop-ozone/dist/src/main/compose/upgrade/README.md
##########
@@ -12,18 +12,91 @@
limitations under the License. See accompanying LICENSE file.
-->
-# Compose file for upgrade
+# Ozone Upgrade Acceptance Tests
-This directory contains a sample cluster definition and script for
-testing upgrade from previous version to the current one.
+This directory contains cluster definitions and scripts for testing upgrades
from any previous version to another
+previous version, or to the local build of the code. It is designed to catch
backwards incompatible changes made between
+an older release of Ozone and a later release (which may be the local build).
-Data for each container is persisted in mounted volume (by default it's
-`data` under the `compose/upgrade` directory, but can be overridden via
-`OZONE_VOLUME` environment variable).
+## IMPORTANT NOTES
-Prior version is run using an official `apache/ozone` image, while the
-current version is run with the `ozone-runner` image using locally built
-source code.
+1. Backwards Incompatibility
+ - These tests will not catch backwards incompatible changes against
commits in between releases.
+ - Example:
+ 1. After 1.0.0, a change *c1* is made that is backwards compatible
with *1.0.0*.
+ 2. After *c1*, a new change *c2* is made that is also backwards
compatible with 1.0.0 but backwards *incompatible* with *c1*.
-Currently the test script only supports a single version upgrade (eg.
-from 0.5.0 to 1.0.0).
+ - This test suite will not raise an error for *c2*, because it
only tests against the last release
+ (1.0.0), and not the last commit (*c1*).
+
+2. Downgrade Support
+ - Downgrades will not be supported until upgrading from 1.1.0 to 1.2.0
+
+ - Until 1.1.0 is released, downgrades cannot be tested, so they are
commented out of the current non-rolling upgrade tests.
+
+## Directory Layout
+
+### upgrades
+
+Each type of upgrade has a subdirectory under the *upgrades* directory. Each
upgrade's steps are controlled by a *test.sh* script in its
*upgrades/\<upgrade-type>* directory. Callbacks to execute throughout the
upgrade are called by this script and should be placed in a file called
*callback.sh* in the *upgrades/\<upgrade-type>/\<upgrade-from>-\<upgrade-to>*
directory. After the test is run, results and docker volume data for the
upgrade for these versions will also be placed in this directory. The results
of all upgrades run as part of the tests will be placed in a *results* folder
in the top level upgrade directory.
+
+#### manual-upgrade
+
+- Any necessary conversion of on disk structures from the old version to the
new version must be done explicitly.
+
+- This is primarily for testing upgrades from versions before the non-rolling
upgrade framework was introduced.
+
+- Supported Callbacks:
+ 1. `setup_with_old_version`: Run before ozone is started in the old
version.
+ 3. `with_old_version`: Run while ozone is running in the old version.
+ 3. `setup_with_new_version`: Run after ozone is stopped in the old
version, but before it is restarted in the new version.
+ 4. `with_new_version`: Run while ozone is running in the new version.
+
+#### non-rolling-upgrade
+
+- Any necessary conversion of on disk structures from the old version to the
new version are handled by Ozone's non-rolling upgrade framework.
+
+- Supported Callbacks:
+ 1. `setup`: Run before ozone is started in the old version.
+ 3. `with_old_version`: Run while ozone is running in the old version.
+ 3. `with_new_version_pre_finalized`: Run after ozone is stopped in the old
version, and brought back up and running in the new version pre-finalized.
Review comment:
I thought about having this callback for non-rolling upgrades, but
decided against it, since it could be used to intentionally or accidentally
modify disk structures to make the upgrade work and circumvent the upgrade
framework.
If there are useful things that could be put here we could add it, but I
could not come up with any. Changes like SCM HA will actually not be able to
run here since the cluster will still be pre-finalized on restart. Here is the
flow:
1. Start old version
2. Stop
3. Restart with new version (pre-finalized)
- New features cannot be used.
4. Stop
5. Restart with old version (downgrade)
6. Stop
- If I understand correctly, here is the proposed location of the new
callback?
7. Restart with new version (pre-finalized)
- New features cannot be used.
8. Finalize
- This is a command given to the cluster, finalization does not involve
restart.
- Only now can new features like SCM HA be used, in the
`with_new_version_finalized` callback.
- In this callback:
1. The cluster can be restarted with the new SCM HA configuration by
calling `<cluster>/load.sh` and `start_docker_env`.
2. SCM HA specific tests can now be run.
When the cluster is pre-finalized, it is using the new bits that technically
have the new features like SCM HA, but outwardly behaving like it is still the
old version.
This means enabling SCM HA in the config and restarting into a pre-finalized
state will fail. But maybe testing that it actually does fail is a good test to
have, and we can add a callback here for this if we want.
When doing upgrades where the *from* version already has SCM HA (maybe 1.2.0
-> 1.3.0 in the future), we can just use SCM HA by default throughout the whole
test, like we do with OM HA, since it will be an established feature at this
point. This is where having different callbacks for different upgrade versions
becomes useful.
##########
File path: hadoop-ozone/dist/src/main/compose/upgrade/test.sh
##########
@@ -15,18 +15,27 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-SCRIPT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )
-ALL_RESULT_DIR="$SCRIPT_DIR/result"
-mkdir -p "$ALL_RESULT_DIR"
-rm "$ALL_RESULT_DIR"/* || true
-source "$SCRIPT_DIR/../testlib.sh"
+# Version that will be run using the local build.
+: "${OZONE_CURRENT_VERSION:=1.1.0}"
+export OZONE_CURRENT_VERSION
-tests=$(find_tests)
-cd "$SCRIPT_DIR"
+TEST_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )
+source "$TEST_DIR/testlib.sh"
-run_test_scripts ${tests}
-RESULT=$?
+# Export variables needed by tests and ../testlib.sh.
+export TEST_DIR
+export COMPOSE_DIR="$TEST_DIR"
-generate_report "upgrade" "${ALL_RESULT_DIR}"
+RESULT_DIR="$ALL_RESULT_DIR" create_results_dir
-exit ${RESULT}
+# Upgrade tests to be run.
+# Run all upgrades even if one fails.
+# Any failure will save a failing return code to $RESULT.
+set +e
+run_test manual-upgrade 0.5.0 1.1.0
+run_test non-rolling-upgrade 1.0.0 1.1.0
Review comment:
Nothing will break when the new release goes out, but we will want to
update the tests to make sure they are testing upgrades from that release's new
pre-built docker image to the current code. Here are the steps:
1. Update `OZONE_CURRENT_VERSION`.
- After 1.1.0 release, this becomes OZONE_CURRENT_VERSION=1.2.0.
- Now the tests are running upgrades to and from pre-built docker
images, which will pass because its already been tested, but we don't really
need to test this case so we should update with the upgrades we actually want
to run.
2. Determine which upgrades we now want to test.
- This will probably be `manual-upgrade 1.0.0. 1.1.0` and
`non-rolling-upgrade 1.1.0 1.2.0`. Perhaps `manual-upgrade 0.5.0 1.2.0`.
- Since 1.1.0 is the last version to ship without the upgrade framework,
it will be the last one where we add a manual upgrade to it, instead of a
non-rolling upgrade.
- After a recent change I made, the tests will still run after this,
they will just not have any callbacks defined to use for the new version
upgrades.
3. Add callbacks for new upgrades
- For manual upgrades, make new callbacks with/without version specific
disk formatting changes.
- Only 0.5.0 to 1.2.0 would need the existing formatting changes we
are using, so can probably just reuse existing callbacks here.
- No formatting should be necessary for 1.0.0. to 1.1.0 since Ozone
is GA, so this will have a new set of callbacks from the other manual upgrade.
- For the non-rolling upgrade, we can just rename the callback directory
to reflect the new versions. This is a special case for 1.1.0 release, since it
is the last version shipping without the upgrade framework.
- For future releases with the upgrade framework, we will need new
callbacks to check the different MetadataLayoutVersions, and check that new
features introduced in those releases are only used after finalize.
- For 1.1.0 to 1.2.0 for example, we will want to add tests for
SCM HA which are unique to that upgrade.
- Also a special case after the 1.1.0 release: uncomment the
downgrade code since it will now be supported.
If we decide to share tests between upgrades (for example, most upgrades
will probably be doing the same read/write tests regardless of version), we can
move those to a separate file and source it in each upgrade's callback script
that needs it.
The framework is designed to be extensible. When we have rolling upgrades,
they should be easy to add as another upgrade type to test along with
non-rolling and manual, and with its own set of callback points.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]