errose28 commented on a change in pull request #2056:
URL: https://github.com/apache/ozone/pull/2056#discussion_r608223192



##########
File path: 
hadoop-ozone/dist/src/main/compose/upgrade/upgrades/manual-upgrade/test.sh
##########
@@ -20,32 +20,37 @@
 
 set -e -o pipefail
 
-COMPOSE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-export COMPOSE_DIR
-
-: "${OZONE_REPLICATION_FACTOR:=3}"
-: "${OZONE_UPGRADE_FROM:="0.5.0"}"
-: "${OZONE_UPGRADE_TO:="1.0.0"}"
-: "${OZONE_VOLUME:="${COMPOSE_DIR}/data"}"
-
-export OZONE_REPLICATION_FACTOR OZONE_UPGRADE_FROM OZONE_UPGRADE_TO 
OZONE_VOLUME
-
-source "${COMPOSE_DIR}/testlib.sh"
-
-create_data_dir
-
-prepare_for_binary_image "${OZONE_UPGRADE_FROM}"
-load_version_specifics "${OZONE_UPGRADE_FROM}"
-first_run
-unload_version_specifics
-
-from=$(get_logical_version "${OZONE_UPGRADE_FROM}")
-to=$(get_logical_version "${OZONE_UPGRADE_TO}")
-execute_upgrade_steps "$from" "$to"
-
-prepare_for_binary_image "${OZONE_UPGRADE_TO}"
-load_version_specifics "${OZONE_UPGRADE_TO}"
-second_run
-unload_version_specifics
-
-generate_report
+# Fail if required vars are not set.

Review comment:
       I see your point. A name like `test.sh` seems to imply that you can run 
it directly and it will do the testing just fine, since this is the pattern 
with the other acceptance tests. Perhaps renaming to `driver.sh` would be 
better, since they are driving the specified upgrade flow? I would like to use 
the `run_test_script` functionality here, however. Maybe that function can be 
modified to read from a variable specifying the name of the file to use.
   
   I do think the variables should be required to be defined here. I don't 
think there are sensible defaults so I would rather fail with a clear error 
than do something unintentional if the test is incorrectly set up.
   
   Let me know your thoughts.

##########
File path: hadoop-ozone/dist/src/main/compose/upgrade/README.md
##########
@@ -12,18 +12,91 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-# Compose file for upgrade
+# Ozone Upgrade Acceptance Tests
 
-This directory contains a sample cluster definition and script for
-testing upgrade from previous version to the current one.
+This directory contains cluster definitions and scripts for testing upgrades 
from any previous version to another
+previous version, or to the local build of the code. It is designed to catch 
backwards incompatible changes made between
+an older release of Ozone and a later release (which may be the local build).
 
-Data for each container is persisted in mounted volume (by default it's
-`data` under the `compose/upgrade` directory, but can be overridden via
-`OZONE_VOLUME` environment variable).
+## IMPORTANT NOTES
 
-Prior version is run using an official `apache/ozone` image, while the
-current version is run with the `ozone-runner` image using locally built
-source code.
+1. Backwards Incompatibility
+    - These tests will not catch backwards incompatible changes against 
commits in between releases.
+        - Example:
+            1. After 1.0.0, a change *c1* is made that is backwards compatible 
with *1.0.0*.
+            2. After *c1*, a new change *c2* is made that is also backwards 
compatible with 1.0.0 but backwards *incompatible* with *c1*.
 
-Currently the test script only supports a single version upgrade (eg.
-from 0.5.0 to 1.0.0).
+            - This test suite will not raise an error for *c2*, because it 
only tests against the last release
+            (1.0.0), and not the last commit (*c1*).
+
+2. Downgrade Support
+    - Downgrades will not be supported until upgrading from 1.1.0 to 1.2.0
+
+    - Until 1.1.0 is released, downgrades cannot be tested, so they are 
commented out of the current non-rolling upgrade tests.
+
+## Directory Layout
+
+### upgrades
+
+Each type of upgrade has a subdirectory under the *upgrades* directory. Each 
upgrade's steps are controlled by a *test.sh* script in its 
*upgrades/\<upgrade-type>* directory. Callbacks to execute throughout the 
upgrade are called by this script and should be placed in a file called 
*callback.sh* in the *upgrades/\<upgrade-type>/\<upgrade-from>-\<upgrade-to>* 
directory. After the test is run, results and docker volume data for the 
upgrade for these versions will also be placed in this directory. The results 
of all upgrades run as part of the tests will be placed in a *results* folder 
in the top level upgrade directory.
+
+#### manual-upgrade
+
+- Any necessary conversion of on disk structures from the old version to the 
new version must be done explicitly.
+
+- This is primarily for testing upgrades from versions before the non-rolling 
upgrade framework was introduced.
+
+- Supported Callbacks:
+    1. `setup_with_old_version`: Run before ozone is started in the old 
version.
+    3. `with_old_version`: Run while ozone is running in the old version.
+    3. `setup_with_new_version`: Run after ozone is stopped in the old 
version, but before it is restarted in the new version.
+    4. `with_new_version`: Run while ozone is running in the new version.
+
+#### non-rolling-upgrade
+
+- Any necessary conversion of on disk structures from the old version to the 
new version are handled by Ozone's non-rolling upgrade framework.
+
+- Supported Callbacks:
+    1. `setup`: Run before ozone is started in the old version.
+    3. `with_old_version`: Run while ozone is running in the old version.
+    3. `with_new_version_pre_finalized`: Run after ozone is stopped in the old 
version, and brought back up and running in the new version pre-finalized.

Review comment:
       I thought about having this callback for non-rolling upgrades, but 
decided against it, since it could be used to intentionally or accidentally 
modify disk structures to make the upgrade work and circumvent the upgrade 
framework.
   If there are useful things that could be put here we could add it, but I 
could not come up with any. Changes like SCM HA will actually not be able to 
run here since the cluster will still be pre-finalized on restart. Here is the 
flow:
   
   1. Start old version
   2. Stop
   3. Restart with new version (pre-finalized)
       - New features cannot be used.
   4. Stop
   5. Restart with old version (downgrade)
   6. Stop
       - If I understand correctly, here is the proposed location of the new 
callback?
   7. Restart with new version (pre-finalized)
       - New features cannot be used.
   8. Finalize
       - This is a command given to the cluster, finalization does not involve 
restart.
       - Only now can new features like SCM HA be used, in the 
`with_new_version_finalized` callback.
       - In this callback:
           1. The cluster can be restarted with the new SCM HA configuration by 
calling `<cluster>/load.sh` and `start_docker_env`.
           2. SCM HA specific tests can now be run.
   
   When the cluster is pre-finalized, it is using the new bits that technically 
have the new features like SCM HA, but outwardly behaving like it is still the 
old version.
   This means enabling SCM HA in the config and restarting into a pre-finalized 
state will fail. But maybe testing that it actually does fail is a good test to 
have, and we can add a callback here for this if we want.
   
   When doing upgrades where the *from* version already has SCM HA (maybe 1.2.0 
-> 1.3.0 in the future), we can just use SCM HA by default throughout the whole 
test, like we do with OM HA, since it will be an established feature at this 
point. This is where having different callbacks for different upgrade versions 
becomes useful.

##########
File path: hadoop-ozone/dist/src/main/compose/upgrade/test.sh
##########
@@ -15,18 +15,27 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-SCRIPT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )
-ALL_RESULT_DIR="$SCRIPT_DIR/result"
-mkdir -p "$ALL_RESULT_DIR"
-rm "$ALL_RESULT_DIR"/* || true
-source "$SCRIPT_DIR/../testlib.sh"
+# Version that will be run using the local build.
+: "${OZONE_CURRENT_VERSION:=1.1.0}"
+export OZONE_CURRENT_VERSION
 
-tests=$(find_tests)
-cd "$SCRIPT_DIR"
+TEST_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )
+source "$TEST_DIR/testlib.sh"
 
-run_test_scripts ${tests}
-RESULT=$?
+# Export variables needed by tests and ../testlib.sh.
+export TEST_DIR
+export COMPOSE_DIR="$TEST_DIR"
 
-generate_report "upgrade" "${ALL_RESULT_DIR}"
+RESULT_DIR="$ALL_RESULT_DIR" create_results_dir
 
-exit ${RESULT}
+# Upgrade tests to be run.
+# Run all upgrades even if one fails.
+# Any failure will save a failing return code to $RESULT.
+set +e
+run_test manual-upgrade 0.5.0 1.1.0
+run_test non-rolling-upgrade 1.0.0 1.1.0

Review comment:
       Nothing will break when the new release goes out, but we will want to 
update the tests to make sure they are testing upgrades from that release's new 
pre-built docker image to the current code. Here are the steps:
   
   1. Update `OZONE_CURRENT_VERSION`.
       - After 1.1.0 release, this becomes OZONE_CURRENT_VERSION=1.2.0.
       - Now the tests are running upgrades to and from pre-built docker 
images, which will pass because its already been tested, but we don't really 
need to test this case so we should update with the upgrades we actually want 
to run.
   2. Determine which upgrades we now want to test.
       - This will probably be `manual-upgrade 1.0.0. 1.1.0` and 
`non-rolling-upgrade 1.1.0 1.2.0`. Perhaps `manual-upgrade 0.5.0 1.2.0`.
       - Since 1.1.0 is the last version to ship without the upgrade framework, 
it will be the last one where we add a manual upgrade to it, instead of a 
non-rolling upgrade.
       - After a recent change I made, the tests will still run after this, 
they will just not have any callbacks defined to use for the new version 
upgrades.
   3. Add callbacks for new upgrades
       - For manual upgrades, make new callbacks with/without version specific 
disk formatting changes.
           - Only 0.5.0 to 1.2.0 would need the existing formatting changes we 
are using, so can probably just reuse existing callbacks here.
           - No formatting should be necessary for 1.0.0. to 1.1.0 since Ozone 
is GA, so this will have a new set of callbacks from the other manual upgrade.
       - For the non-rolling upgrade, we can just rename the callback directory 
to reflect the new versions. This is a special case for 1.1.0 release, since it 
is the last version shipping without the upgrade framework.
           - For future releases with the upgrade framework, we will need new 
callbacks to check the different MetadataLayoutVersions, and check that new 
features introduced in those releases are only used after finalize.
               - For 1.1.0 to 1.2.0 for example, we will want to add tests for 
SCM HA which are unique to that upgrade.
           - Also a special case after the 1.1.0 release: uncomment the 
downgrade code since it will now be supported.
   
   If we decide to share tests between upgrades (for example, most upgrades 
will probably be doing the same read/write tests regardless of version), we can 
move those to a separate file and source it in each upgrade's callback script 
that needs it.
   The framework is designed to be extensible. When we have rolling upgrades, 
they should be easy to add as another upgrade type to test along with 
non-rolling and manual, and with its own set of callback points.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to