another thought is to have the sufficient_system_resources_for_resource_intensive_tests fixture dynamically figure out the number of threads to run stress with. seems reasonable we should significantly lower our concurrency dynamically when we are resource constrained.
> On Jan 10, 2018, at 1:53 PM, Michael Kjellman <mkjell...@internalcircle.com> > wrote: > > i had done some limited testing on the medium size an didn't see quite as bad > behavior you were seeing... :\ > > i added a test fixture > (sufficient_system_resources_for_resource_intensive_tests) that just > currently does a very very basic check free memory check and deselects tests > annotated with the @pytest.mark.resource_intensive annotation if the current > system doesn't have enough resources. > > my short/medium term thinking was that we could expand on this and > dynamically skip tests for whatever physical resource constraints we're > working with -- with the ultimate goal to dynamically run as many tests > reliably as possible given what we have. > > Any chance you'd mind changing your circleci config to set CCM_MAX_HEAP_SIZE > under resource_constrained_env_vars to 769MB and kicking off another run to > get us a baseline? I see a ton of the failures were from tests that run > stress to pre-fill the cluster for the test.. do you know if we have a way to > control the heap settings of stress when it's invoked via ccm.node as we do > in the dtests? > > On Jan 10, 2018, at 1:04 PM, Stefan Podkowinski > <s...@apache.org<mailto:s...@apache.org>> wrote: > > I was giving this another try today to see how long it would take to > finish on a oss account. But I've canceled the job after some hours as > tests started to fail almost constantly. > > https://circleci.com/gh/spodkowinski/cassandra/176 > > Looks like the 2CPU/4096MB (medium) limit for each container isn't > really adequate for dtests. Yours seem to be running on xlarge. > > > On 10.01.18 21:05, Michael Kjellman wrote: > plan of action is to continue running everything on asf jenkins. > > in additional all developers (just like today) will be free to run the unit > tests and as many of the dtests as possible against their local test branches > in circleci. circleci offers a free OSS account with 4 containers. while it > will be slow, it will run. additionally anyone who wants more speed is > obviously free to upgrade their account. > > does that plan resolve any concerns you have? > > On Jan 10, 2018, at 12:01 PM, Josh McKenzie <jmcken...@apache.org> wrote: > > 1) have *all* our tests run on *every* commit > Have we discussed the cost / funding aspect of this? I know we as a project > have run into infra-donation cost issues in the past with differentiating > between ASF as a whole and cassandra as a project, so not sure how that'd > work in terms of sponsors funding circleci containers just for this > project's use, for instance. > > This is a huge improvement in runtime (understatement of the day award...) > so great work on that front. > > > > On Tue, Jan 9, 2018 at 11:04 PM, Nate McCall <zznat...@gmail.com> wrote: > > Making these tests more accessible and reliable is super huge. There > are a lot of folks in our community who are not well versed with > python (myself included). I wholly support *any* efforts we can make > for the dtest process to be easy. > > Thanks a bunch for taking this on. I think it will pay off quickly. > > On Wed, Jan 10, 2018 at 4:55 PM, Michael Kjellman <kjell...@apple.com> > wrote: > hi! > > a few of us have been continuously iterating on the dtest-on-pytest > branch now since the 2nd and we’ve run the dtests close to 600 times in ci. > ariel has been working his way thru a formal review (three cheers for > ariel!) > flaky tests are a real thing and despite a few dozen totally green test > runs, the vast majority of runs are still reliably hitting roughly 1-3 test > failures. in a world where we can now run the dtests in 20 minutes instead > of 13 hours it’s now at least possible to keep finding these flaky tests > and fixing them one by one... > i haven’t gotten a huge amount of feedback overall and i really want to > hear it! ultimately this work is driven by the desire to 1) have *all* our > tests run on *every* commit; 2) be able to trust the results; 3) make our > testing story so amazing that even the most casual weekend warrior who > wants to work on the project can (and will want to!) use it. > i’m *not* a python guy (although lucky i know and work with many who > are). thankfully i’ve been able to defer to them for much of this largely > python based effort.... i’m sure there are a few more people working on the > project who do consider themselves python experts and i’d especially > appreciate your feedback! > finally, a lot of my effort was focused around improving the end users > experience (getting bootstrapped, running the tests, improving the > debugability story, etc). i’d really appreciate it if people could try > running the pytest branch and following the install instructions to figure > out what could be improved on. any existing behavior i’ve inadvertently now > removed that’s going to make someone’s life miserable? 😅 > thanks! looking forward to hearing any and all feedback from the > community! > best, > kjellman > > > > On Jan 3, 2018, at 8:08 AM, Michael Kjellman < > mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote: > no, i’m not. i just figured i should target python 3.6 if i was doing > this work in the first place. the current Ubuntu LTS was pulling in a > pretty old version. any concerns with using 3.6? > On Jan 3, 2018, at 1:51 AM, Stefan Podkowinski <s...@apache.org<mailto: > s...@apache.org>> wrote: > The latest updates to your branch fixed the logging issue, thanks! Tests > now seem to execute fine locally using pytest. > > I was looking at the dockerfile and noticed that you explicitly use > python 3.6 there. Are you aware of any issues with older python3 > versions, e.g. 3.5? Do I have to use 3.6 as well locally and do we have > to do the same for jenkins? > > > On 02.01.2018 22:42, Michael Kjellman wrote: > I reproduced the NOTSET log issue locally... got a fix.. i'll push a > commit up in a moment. > On Jan 2, 2018, at 11:24 AM, Michael Kjellman < > mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote: > Comments Inline: Thanks for giving this a go!! > > On Jan 2, 2018, at 6:10 AM, Stefan Podkowinski <s...@apache.org<mailto: > s...@apache.org>> wrote: > I was giving this a try today with some mixed results. First of all, > running pytest locally would fail with an "ccmlib.common.ArgumentError: > Unknown log level NOTSET" error for each test. Although I created a new > virtualenv for that as described in the readme (thanks for updating!) > and use both of your dtest and cassandra branches. But I haven't patched > ccm as described in the ticket, maybe that's why? Can you publish a > patched ccm branch to gh? > > 99% sure this is an issue parsing the logging level passed to pytest to > the python logger... could you paste the exact command you're using to > invoke pytest? should be a small change - i'm sure i just missed a > invocation case. > > The updated circle.yml is now using docker, which seems to be a good > idea to reduce clutter in the yaml file and gives us more control over > the test environment. Can you add the Dockerfile to the .circleci > directory as well? I couldn't find it when I was trying to solve the > pytest error mentioned above. > > This is already tracked in a separate repo: > https://github.com/mkjellman/cassandra-test-docker/blob/master/Dockerfile > Next thing I did was to push your trunk_circle branch to my gh repo to > start a circleCI run. Finishing all dtests in 15 minutes sounds > exciting, but requires a paid tier plan to get that kind of > parallelization. Looks like the dtests have even been deliberately > disabled for non-paid accounts, so I couldn't test this any further. > > the plan of action (i already already mentioned this in previous emails) > is to get dtests working for the free circieci oss accounts as well. part > of this work (already included in this pytest effort) is to have fixtures > that look at the system resources and dynamically include tests as possible. > > Running dtests from the pytest branch on builds.apache.org<http:// > builds.apache.org> did not work > either. At least the run_dtests.py arguments will need to be updated in > cassandra-builds. We currently only use a single cassandra-dtest.sh > script for all builds. Maybe we should create a new job template that > would use an updated script with the wip-pytest dtest branch, to make > this work and testable in parallel. > > yes, i didn't touch cassandra-builds yet.. focused on getting circleci > and local runs working first... once we're happy with that and stable we > can make the changes to jenkins configs pretty easily... > > > > On 21.12.2017 11:13, Michael Kjellman wrote: > I just created https://issues.apache.org/jira/browse/CASSANDRA-14134 > which includes tons of details (and a patch available for review) with my > efforts to migrate dtests from nosetest to pytest (which ultimately ended > up also including porting the ode from python 2.7 to python 3). > I'd love if people could pitch in in any way to help get this reviewed > and committed so we can reduce the natural drift that will occur with a > huge patch like this against the changes going into master. I apologize for > sending this so close to the holidays, but I really have been working > non-stop trying to get things into a completed and stable state. > The latest CircleCI runs I did took roughly 15 minutes to run all the > dtests with only 6 failures remaining (when run with vnodes) and 12 > failures remaining (when run without vnodes). For comparison the last ASF > Jenkins Dtest job to successfully complete took nearly 10 hours (9:51) and > we had 36 test failures. Of note, while I was working on this and trying to > determine a baseline for the existing tests I found that the ASF Jenkins > jobs were incorrectly configured due to a typo. The no-vnodes job is > actually running with vnodes (meaning the no-vnodes job is identical to the > with-vnodes ASF Jenkins job). There are some bootstrap tests that will 100% > reliably hang both nosetest and pytest on test cleanup, however this test > only runs in the no-vnodes configuration. I've debugged and fixed a lot of > these cases across many test cases over the past few weeks and I no longer > know of any tests that can hang CI. > Thanks and I'm optimistic about making testing great for the project and > most importantly for the OSS C* community! > best, > kjellman > > Some highlights that I quickly thought of (in no particular order): > {also included in the JIRA} > -Migrate dtests from executing using the nosetest framework to pytest > -Port the entire code base from Python 2.7 to Python 3.6 > -Update run_dtests.py to work with pytest > -Add --dtest-print-tests-only option to run_dtests.py to get easily > parsable list of all available collected tests > -Update README.md for executing the dtests with pytest > -Add new debugging tips section to README.md to help with some basics of > debugging python3 and pytest > -Migrate all existing Enviornment Variable usage as a means to control > dtest operation modes to argparse command line options with documented help > on each toggles intended usage > -Migration of old unitTest and nose based test structure to modern > pytest fixture approach > -Automatic detection of physical system resources to automatically > determine if @pytest.mark.resource_intensive annotated tests should be > collected and run on the system where they are being executed > -new pytest fixture replacements for @since and > @pytest.mark.upgrade_test annotations > -Migration to python logging framework > -Upgrade thrift bindings to latest version with full python3 > compatibility > -Remove deprecated cql and pycassa dependencies and migrate any > remaining tests to fully remove those dependencies > -Fixed dozens of tests that would hang the pytest framework forever when > run in CI enviornments > -Ran code nearly 300 times in CircleCI during the migration and to find, > identify, and fix any tests capable of hanging CI > -Upgrade Tests do not yet run in CI and still need additional migration > work (although all upgrade test classes compile successfully) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev- > unsubscr...@cassandra.apache.org> > For additional commands, e-mail: dev-h...@cassandra.apache.org<mailto: > dev-h...@cassandra.apache.org> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev- > unsubscr...@cassandra.apache.org> > For additional commands, e-mail: dev-h...@cassandra.apache.org<mailto: > dev-h...@cassandra.apache.org> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org<mailto:dev- > unsubscr...@cassandra.apache.org> > For additional commands, e-mail: dev-h...@cassandra.apache.org<mailto: > dev-h...@cassandra.apache.org> > B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB� > � [��X��ܚX�K K[XZ[ � ]�][��X��ܚX�P �\��[� �K�\ X� K�ܙ�B��܈ Y ] [ۘ[ > ��[X[� � K[XZ[ � ]�Z [ �\��[� �K�\ X� K�ܙ�B�B > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org