i had done some limited testing on the medium size an didn't see quite as bad behavior you were seeing... :\
i added a test fixture (sufficient_system_resources_for_resource_intensive_tests) that just currently does a very very basic check free memory check and deselects tests annotated with the @pytest.mark.resource_intensive annotation if the current system doesn't have enough resources. my short/medium term thinking was that we could expand on this and dynamically skip tests for whatever physical resource constraints we're working with -- with the ultimate goal to dynamically run as many tests reliably as possible given what we have. Any chance you'd mind changing your circleci config to set CCM_MAX_HEAP_SIZE under resource_constrained_env_vars to 769MB and kicking off another run to get us a baseline? I see a ton of the failures were from tests that run stress to pre-fill the cluster for the test.. do you know if we have a way to control the heap settings of stress when it's invoked via ccm.node as we do in the dtests? On Jan 10, 2018, at 1:04 PM, Stefan Podkowinski <s...@apache.org<mailto:s...@apache.org>> wrote: I was giving this another try today to see how long it would take to finish on a oss account. But I've canceled the job after some hours as tests started to fail almost constantly. https://circleci.com/gh/spodkowinski/cassandra/176 Looks like the 2CPU/4096MB (medium) limit for each container isn't really adequate for dtests. Yours seem to be running on xlarge. On 10.01.18 21:05, Michael Kjellman wrote: plan of action is to continue running everything on asf jenkins. in additional all developers (just like today) will be free to run the unit tests and as many of the dtests as possible against their local test branches in circleci. circleci offers a free OSS account with 4 containers. while it will be slow, it will run. additionally anyone who wants more speed is obviously free to upgrade their account. does that plan resolve any concerns you have? On Jan 10, 2018, at 12:01 PM, Josh McKenzie <jmcken...@apache.org> wrote: 1) have *all* our tests run on *every* commit Have we discussed the cost / funding aspect of this? I know we as a project have run into infra-donation cost issues in the past with differentiating between ASF as a whole and cassandra as a project, so not sure how that'd work in terms of sponsors funding circleci containers just for this project's use, for instance. This is a huge improvement in runtime (understatement of the day award...) so great work on that front. On Tue, Jan 9, 2018 at 11:04 PM, Nate McCall <zznat...@gmail.com> wrote: Making these tests more accessible and reliable is super huge. There are a lot of folks in our community who are not well versed with python (myself included). I wholly support *any* efforts we can make for the dtest process to be easy. Thanks a bunch for taking this on. I think it will pay off quickly. On Wed, Jan 10, 2018 at 4:55 PM, Michael Kjellman <kjell...@apple.com> wrote: hi! a few of us have been continuously iterating on the dtest-on-pytest branch now since the 2nd and we’ve run the dtests close to 600 times in ci. ariel has been working his way thru a formal review (three cheers for ariel!) flaky tests are a real thing and despite a few dozen totally green test runs, the vast majority of runs are still reliably hitting roughly 1-3 test failures. in a world where we can now run the dtests in 20 minutes instead of 13 hours it’s now at least possible to keep finding these flaky tests and fixing them one by one... i haven’t gotten a huge amount of feedback overall and i really want to hear it! ultimately this work is driven by the desire to 1) have *all* our tests run on *every* commit; 2) be able to trust the results; 3) make our testing story so amazing that even the most casual weekend warrior who wants to work on the project can (and will want to!) use it. i’m *not* a python guy (although lucky i know and work with many who are). thankfully i’ve been able to defer to them for much of this largely python based effort.... i’m sure there are a few more people working on the project who do consider themselves python experts and i’d especially appreciate your feedback! finally, a lot of my effort was focused around improving the end users experience (getting bootstrapped, running the tests, improving the debugability story, etc). i’d really appreciate it if people could try running the pytest branch and following the install instructions to figure out what could be improved on. any existing behavior i’ve inadvertently now removed that’s going to make someone’s life miserable? 😅 thanks! looking forward to hearing any and all feedback from the community! best, kjellman On Jan 3, 2018, at 8:08 AM, Michael Kjellman < mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote: no, i’m not. i just figured i should target python 3.6 if i was doing this work in the first place. the current Ubuntu LTS was pulling in a pretty old version. any concerns with using 3.6? On Jan 3, 2018, at 1:51 AM, Stefan Podkowinski <s...@apache.org<mailto: s...@apache.org>> wrote: The latest updates to your branch fixed the logging issue, thanks! Tests now seem to execute fine locally using pytest. I was looking at the dockerfile and noticed that you explicitly use python 3.6 there. Are you aware of any issues with older python3 versions, e.g. 3.5? Do I have to use 3.6 as well locally and do we have to do the same for jenkins? On 02.01.2018 22:42, Michael Kjellman wrote: I reproduced the NOTSET log issue locally... got a fix.. i'll push a commit up in a moment. On Jan 2, 2018, at 11:24 AM, Michael Kjellman < mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote: Comments Inline: Thanks for giving this a go!! On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski <s...@apache.org<mailto: s...@apache.org>> wrote: I was giving this a try today with some mixed results. First of all, running pytest locally would fail with an "ccmlib.common.ArgumentError: Unknown log level NOTSET" error for each test. Although I created a new virtualenv for that as described in the readme (thanks for updating!) and use both of your dtest and cassandra branches. But I haven't patched ccm as described in the ticket, maybe that's why? Can you publish a patched ccm branch to gh? 99% sure this is an issue parsing the logging level passed to pytest to the python logger... could you paste the exact command you're using to invoke pytest? should be a small change - i'm sure i just missed a invocation case. The updated circle.yml is now using docker, which seems to be a good idea to reduce clutter in the yaml file and gives us more control over the test environment. Can you add the Dockerfile to the .circleci directory as well? I couldn't find it when I was trying to solve the pytest error mentioned above. This is already tracked in a separate repo: https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile Next thing I did was to push your trunk_circle branch to my gh repo to start a circleCI run. Finishing all dtests in 15 minutes sounds exciting, but requires a paid tier plan to get that kind of parallelization. Looks like the dtests have even been deliberately disabled for non-paid accounts, so I couldn't test this any further. the plan of action (i already already mentioned this in previous emails) is to get dtests working for the free circieci oss accounts as well. part of this work (already included in this pytest effort) is to have fixtures that look at the system resources and dynamically include tests as possible. Running dtests from the pytest branch on builds.apache.org<http:// builds.apache.org> did not work either. At least the run_dtests.py arguments will need to be updated in cassandra-builds. We currently only use a single cassandra-dtest.sh script for all builds. Maybe we should create a new job template that would use an updated script with the wip-pytest dtest branch, to make this work and testable in parallel. yes, i didn't touch cassandra-builds yet.. focused on getting circleci and local runs working first... once we're happy with that and stable we can make the changes to jenkins configs pretty easily... On 21.12.2017 11:13, Michael Kjellman wrote: I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 which includes tons of details (and a patch available for review) with my efforts to migrate dtests from nosetest to pytest (which ultimately ended up also including porting the ode from python 2.7 to python 3). I'd love if people could pitch in in any way to help get this reviewed and committed so we can reduce the natural drift that will occur with a huge patch like this against the changes going into master. I apologize for sending this so close to the holidays, but I really have been working non-stop trying to get things into a completed and stable state. The latest CircleCI runs I did took roughly 15 minutes to run all the dtests with only 6 failures remaining (when run with vnodes) and 12 failures remaining (when run without vnodes). For comparison the last ASF Jenkins Dtest job to successfully complete took nearly 10 hours (9:51) and we had 36 test failures. Of note, while I was working on this and trying to determine a baseline for the existing tests I found that the ASF Jenkins jobs were incorrectly configured due to a typo. The no-vnodes job is actually running with vnodes (meaning the no-vnodes job is identical to the with-vnodes ASF Jenkins job). There are some bootstrap tests that will 100% reliably hang both nosetest and pytest on test cleanup, however this test only runs in the no-vnodes configuration. I've debugged and fixed a lot of these cases across many test cases over the past few weeks and I no longer know of any tests that can hang CI. Thanks and I'm optimistic about making testing great for the project and most importantly for the OSS C* community! best, kjellman Some highlights that I quickly thought of (in no particular order): {also included in the JIRA} -Migrate dtests from executing using the nosetest framework to pytest -Port the entire code base from Python 2.7 to Python 3.6 -Update run_dtests.py to work with pytest -Add --dtest-print-tests-only option to run_dtests.py to get easily parsable list of all available collected tests -Update README.md for executing the dtests with pytest -Add new debugging tips section to README.md to help with some basics of debugging python3 and pytest -Migrate all existing Enviornment Variable usage as a means to control dtest operation modes to argparse command line options with documented help on each toggles intended usage -Migration of old unitTest and nose based test structure to modern pytest fixture approach -Automatic detection of physical system resources to automatically determine if @pytest.mark.resource_intensive annotated tests should be collected and run on the system where they are being executed -new pytest fixture replacements for @since and @pytest.mark.upgrade_test annotations -Migration to python logging framework -Upgrade thrift bindings to latest version with full python3 compatibility -Remove deprecated cql and pycassa dependencies and migrate any remaining tests to fully remove those dependencies -Fixed dozens of tests that would hang the pytest framework forever when run in CI enviornments -Ran code nearly 300 times in CircleCI during the migration and to find, identify, and fix any tests capable of hanging CI -Upgrade Tests do not yet run in CI and still need additional migration work (although all upgrade test classes compile successfully) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev- unsubscr...@cassandra.apache.org> For additional commands, e-mail: dev-h...@cassandra.apache.org<mailto: dev-h...@cassandra.apache.org> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev- unsubscr...@cassandra.apache.org> For additional commands, e-mail: dev-h...@cassandra.apache.org<mailto: dev-h...@cassandra.apache.org> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev- unsubscr...@cassandra.apache.org> For additional commands, e-mail: dev-h...@cassandra.apache.org<mailto: dev-h...@cassandra.apache.org> B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB� � [��X��ܚX�K K[XZ[ � ]�][��X��ܚX�P �\��[� �K�\ X� K�ܙ�B��܈ Y ] [ۘ[ ��[X[� � K[XZ[ � ]�Z [ �\��[� �K�\ X� K�ܙ�B�B --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org