[
https://issues.apache.org/jira/browse/CASSANDRA-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321353#comment-16321353
]
Ariel Weisberg commented on CASSANDRA-14134:
--------------------------------------------
Reviewing the commits after the first review it seems like you have been adding
a lot of sleeps to try and fix stuff and in at least the one [case I was
looking at it doesn't fix the
problem|https://github.com/apache/cassandra-dtest/commit/ab4b9b7f4b0f12fd667d16cc19d73efef99688a2].
The reason I push back on that is that future tests and modifications won't
have the necessary sleeps. We want to push this into the infrastructure code as
much as possible so we can "fix it once."
With several of them I need to review them on a case by case basis for
appropriateness and does it fix the underlying problem. If a lot of these are
schema change related I have to ask the question, why can't we change schema
and know reliably when the schema is available to use? That seems like a pretty
basic thing for the database to be able to do. If it doesn't work for us in
tests I would say it's pretty broken for users.
This also holds for nodetool. Nodetool shouldn't be generating errors some of
the time spuriously.
[This shouldn't be necessary CCM has a log line it looks for to know when a
node is up. If that's not working we need to fix
it.|https://github.com/apache/cassandra-dtest/commit/bb1415597f80f2e4e25235e1ee8369b49561fc3f]
[This is another, wait for schema change
stuff.|https://github.com/apache/cassandra-dtest/commit/c182a44b68a43dc0be50f1c3269ac220851010ac]
If we really think this is necessary we should have them call a wrapper
method. Really we shouldn't use sleeps we should do the schema change and then
spin lightly waiting for the schema to be available everywhere it should be
available by querying the nodes. Did this actually fix the test?
[I really think we need to open up to a wider audience for discussing flaky
annotations. We have already gone through this once and gone from using flaky
to not using
flaky.|https://github.com/apache/cassandra-dtest/commit/3ffa61717f760ef7bd6209ad1af9c7bba450492a]
Flaky tests should be annoying and distracting.
Another usage of flakey
https://github.com/apache/cassandra-dtest/commit/e2a3c5575155a45ae862400568be6463f47e1617
I am sort of warming up to using flaky because the alternative is that
developers just don't use the tests and we are generally worse off.
[I couldn't figure out this fix for configuration
test.|https://github.com/apache/cassandra-dtest/commit/aa225ab1c36959b3d19a607eddbae2b16f661194]
Can you help out with an explanation for this?
{{execute_concurrent_with_args}} doesn't document {{results_generator=True}}
I'm not sure if I should run the test fixes through CircleCI. Some of them have
multiple rounds of changes and it's not clear which help and which are placebo
or didn't end up contributing to reliability. Unless you know you only put in
fixes that ended up clearing up a specific error?
> Migrate dtests to use pytest and python3
> ----------------------------------------
>
> Key: CASSANDRA-14134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14134
> Project: Cassandra
> Issue Type: Improvement
> Components: Testing
> Reporter: Michael Kjellman
> Assignee: Michael Kjellman
>
> h4. Get the C* dtests running on the pytest framework.
> C* DTests currently run using the python test framework nosetest. This
> framework has been largely abandoned with no releases since 2015 and a
> general strong consensus in the python community that pytest is the future.
> h4. Why should we do this.
> Currently (and historically) dtests have always been difficult to run, flaky
> and unpredictable in CI environments, and almost impossible to debug.
> On November 28th, 2017, I proposed on the dev@ list that we move the dtests
> from nosetests to pytests. I got replies from Jon Haddad, Philip Thompson,
> and kurt greaves with really only "+1" like replies to the proposal.
> Since then I've been working pretty much non stop to complete the large
> refactor of dtests to pytests. As part of this effort (and due to the
> migration tools that exist require it) I also ported the code to python3
> (from the current python 2.7 based code-base).
> h4. High-level summary of key changes, improvements, and new features.
> * Migrate dtests from executing using the nosetest framework to pytest
> * Port the entire code base from Python 2.7 to Python 3.6
> * Update run_dtests.py to work with pytest
> * Add --dtest-print-tests-only option to run_dtests.py to get easily parsable
> list of all available collected tests
> * Update README.md for executing the dtests with pytest
> * Add new debugging tips section to README.md to help with some basics of
> debugging python3 and pytest
> * Migrate all existing Enviornment Variable usage as a means to control dtest
> operation modes to argparse command line options with documented help on each
> toggles intended usage
> * Migration of old unitTest and nose based test structure to modern pytest
> fixture approach
> * Automatic detection of physical system resources to automatically determine
> if @pytest.mark.resource_intensive annotated tests should be collected and
> run on the system where they are being executed
> * new pytest fixture replacements for @since and @pytest.mark.upgrade_test
> annotations
> * Migration to python logging framework
> * Upgrade thrift bindings to latest version with full python3 compatibility
> * Remove deprecated cql and pycassa dependencies and migrate any remaining
> tests to fully remove those dependencies
> * Fixed dozens of tests that would hang the pytest framework forever when run
> in CI enviornments
> * Ran code nearly 300 times in CircleCI during the migration and to find,
> identify, and fix any tests capable of hanging CI
> * Upgrade Tests do not yet run in CI and still need additional migration work
> (although all upgrade test classes compile successfully)
> I started with the *nose2pytest* [https://github.com/pytest-dev/nose2pytest]
> migration tool. As this required python 3 language support I found myself
> down the 2to3 python migration path. While painful to do this, the benefits
> of python3 over python2.7 are numerous and moving to python3 for the
> additional debugging tools now available to use when fixing dtests makes the
> effort worth it for that reason alone!
> After the automated tools did their thing I began what was a much longer and
> tedious manual process than I ever could have expected due to the custom many
> ways we did things in dtests (frequently to work around nosetest limitations
> of missing features that thankfully are now all included with the pytest
> framework). I've done nearly 300 test runs of my migration branch with
> circleci.
> The latest CircleCI runs can be found at:
> (dtests without vnodes) [https://circleci.com/gh/mkjellman/cassandra/277]
> (dtests with vnodes) [https://circleci.com/gh/mkjellman/cassandra/278]
> With vnodes, there are currently only 6 remaining dtest test failures.
> Without vnodes, there are 12 remaining dtest failures.
> It turns out that after the dtests were moved to ASF Jenkins from cassci, the
> jobs were misconfigured and we actually haven't been running the dtests in
> the non-vnodes configuration. The current most recent trunk dtest job to
> complete on ASF Jenkins (with vnodes) was
> [https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/387/].
> That test run had 36 test failures.
> There are great performance improvements so far with pytests. With a
> parallism factor of 90x in CircleCI the dtests-no-vnodes job completed all
> tests in 16 minutes and 24 seconds and the dtests-with-vnodes job completed
> all tests in 14 minutes and 22 seconds.. (Compare that to the 8+ hours ASF
> Jenkins currently takes!).
> A dtest on pytest compatible CircleCI configuration is available at
> https://github.com/mkjellman/cassandra/blob/trunk_circle/.circleci/config.yml
> A very small patch is required to ccm for 2 places that were not yet python3
> compatible. (I didn't do a full audit of ccm, but these appear to be the only
> two issues I've found even with a ton of testing when using ccmlib with
> dtests and running with python 3).
> {code}
> --- /home/cassandra/env/src/ccm/ccmlib/node.py 2017-12-13 03:11:13.000000000
> +0000-8").splitlines()))
> +++ node.py 2017-12-19 09:51:49.083186951 +0000
> @@ -57,10 +57,16 @@
> message = "Subprocess {} exited with non-zero status; exit status:
> {}".format(command, exit_status)
> if stdout:
> message += "; \nstdout: "
> - message += stdout
> + if isinstance(stdout, str):
> + message += stdout
> + else:
> + message += stdout.decode("utf-8")
> if stderr:
> message += "; \nstderr: "
> - message += stderr
> + if isinstance(stderr, str):
> + message += stderr
> + else:
> + message += stderr.decode("utf-8")
>
> Exception.__init__(self, message)
>
> @@ -2009,7 +2015,10 @@
>
> out, _, _ = handle_external_tool_process(p, ["sstableutil",
> '--type', 'final', ks, table])
>
> - return sorted(filter(lambda s: s.endswith('-Data.db'),
> out.splitlines()))
> + if isinstance(out, str):
> + return sorted(filter(lambda s: s.endswith('-Data.db'),
> out.splitlines()))
> + else:
> + return sorted(filter(lambda s: s.endswith('-Data.db'),
> out.decode("utf-8").splitlines()))
>
> def _get_load_from_info_output(info):
> {code}
> For the actual dtest work itself: I've squashed all my work down into a
> single commit available at
> [https://github.com/mkjellman/cassandra-dtest/tree/dtests_on_pytest_v2]
> And the actual commit available at
> [https://github.com/mkjellman/cassandra-dtest/commit/11201c149d1198c0d6d1a479e08bfb8695f14c9e]
> I tried to put a lot of effort into making the end user experience better for
> both bootstrapping and running dtests for the first time, but also day to
> day. I hope people like the approach! In addition to moving to argparse with
> commented command line arguments to control execution, I also updated the
> README.md to reflect the new pytest world and also added a new section with a
> few tips I learned while I was debugging hundreds of tests during this
> migration process that hopefully are helpful to the rest of the community
> going forward!
> I'm sure the first commit above isn't 100% perfect (although I really have
> put a lot of effort into it) but my one request is to hopefully get consensus
> with this first revision of changes committed to reduce any chance of further
> drift with the codebase as changes are committed into master.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]