> On 3 Sep 2018, at 09:55, Vincent Massol <[email protected]> wrote: > > I propose to do this tomorrow Tuesday, starting with an intro from me, using > youtube live.
Say, 10AM Paris time. Thanks -Vincent > WDYT? > > Thanks > -Vincent > >> On 30 Aug 2018, at 12:27, Adel Atallah <[email protected]> wrote: >> >> Just to be clear, when I proposed "having a whole day dedicated on >> using these tools", I didn't meant having to have it every week but >> only once, so we can properly start improving the tests. It would be >> some kind of training. >> On my side I don't think I'll be able to have on a week one day >> dedicated to tests and one for bug fixing, I won't have time left for >> the roadmap as I will only work on the product 50% of the time. >> >> >> On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol <[email protected]> wrote: >>> Hi, >>> >>> I don’t remember discussing this with you Thomas. Actually I’m not >>> convinced to have a fixed day: >>> * we already have a fixed BFD and having a second one doesn’t leave much >>> flexibility for working on roadmap items when it’s the best >>> * test sessions can be short (0.5-1 hours) and it’s easy to do them between >>> other tasks >>> * it can be boring to spend a full day on them >>> >>> Now, I agree that not having a fixed day will make it hard to make sure >>> that we work 20% on that topic. >>> >>> So if you prefer we can define a day, knowing that some won’t be able to >>> always attend during that day and in this case they should do it on another >>> day. What’s important is to have 20% done each week (i.e. enough work done >>> on it). >>> >>> In term of day, if we have to choose one, I’d say Tuesday. That’s the most >>> logical to me. >>> >>> WDYT? What do you prefer? >>> >>> Thanks >>> -Vincent >>> >>>> On 30 Aug 2018, at 10:38, Thomas Mortagne <[email protected]> >>>> wrote: >>>> >>>> Indeed we discussed this but I don't see it in your mail Vincent. >>>> >>>> On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah <[email protected]> >>>> wrote: >>>>> Hello, >>>>> >>>>> Maybe we should agree on having a whole day dedicated on using these >>>>> tools with a maximum number of developers. >>>>> That way we will be able to help each other and maybe it will make the >>>>> process easier to carry out in the future. >>>>> >>>>> WDYT? >>>>> >>>>> Thanks, >>>>> Adel >>>>> >>>>> >>>>> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol <[email protected]> >>>>> wrote: >>>>>> Hi devs (and anyone else interested to improve the tests of XWiki), >>>>>> >>>>>> History >>>>>> ====== >>>>>> >>>>>> It all started when I analyzed our global TPC and found that it was >>>>>> going down globally even though we have the >>>>>> fail-build-on-jacoco-threshold strategy. >>>>>> >>>>>> I sent several email threads: >>>>>> >>>>>> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6 >>>>>> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn >>>>>> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7 >>>>>> >>>>>> Note: As a consequence of this last thread, I implemented a Jenkins >>>>>> Pipeline to send us a mail when the global TPC of an XWiki module goes >>>>>> down so that we fix it ASAP. This is still a development in progress. A >>>>>> first version is done and running at >>>>>> https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and >>>>>> fix it (it’s not working ATM). >>>>>> >>>>>> As a result of the global TPC going down/stagnating, I have proposed to >>>>>> have 10.7 focused on Tests + BFD. >>>>>> - Initially I proposed to focus on increasing the global TPC by looking >>>>>> at the reports from 1) above >>>>>> (http://markmail.org/message/qjemnip7hjva2rjd). See the last report at >>>>>> https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix the red >>>>>> parts). >>>>>> - Then with the STAMP mid-term review, a bigger urgency surfaced and I >>>>>> asked if we could instead focus on fixing tests as reported by Descartes >>>>>> to increase both coverage and mutation score (ie test quality), since >>>>>> those are 2 metrics/KPIs measured by STAMP and since XWiki participates >>>>>> to STAMP we need to work on them and increase them substantially. See >>>>>> http://markmail.org/message/ejmdkf3hx7drkj52 >>>>>> >>>>>> The results of XWiki 10.7 has been quite poor on test improvements >>>>>> (more focus on BFD than tests, lots of devs on holidays, etc). This >>>>>> forces us to have a different strategy. >>>>>> >>>>>> Full Strategy proposal >>>>>> ================= >>>>>> >>>>>> 1) As many XWiki SAS devs as possible (and anyone else from the >>>>>> community who’s interested ofc! :)) should spend 1 day per week working >>>>>> on improving STAMP metrics >>>>>> * Currently the agreement is that Thomas and myself will do this for the >>>>>> foreseeable future till we get some good-enough metric progress >>>>>> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM >>>>>> (Marius, Adel if he can, Simon in the future). The idea is to see where >>>>>> that could get us by using substantial manpower. >>>>>> >>>>>> 2) All committers: More generally the global TPC failure is also already >>>>>> active and dev need to modify modules that see their global TPC go down. >>>>>> >>>>>> 3) All committers: Of course, the jacoco strategy is also active at each >>>>>> module level. >>>>>> >>>>>> STAMP tools >>>>>> ========== >>>>>> >>>>>> There are 4 tools developed by STAMP: >>>>>> * Descartes: Improves quality of tests by increasing their mutation >>>>>> scores. See http://markmail.org/message/bonb5f7f37omnnog and also >>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes >>>>>> * DSpot: Automatically generate new tests, based on existing tests. See >>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot >>>>>> * CAMP: Takes a Dockerfile and generates mutations of it, then deploys >>>>>> and execute tests on the software to see if the mutation works or not. >>>>>> Note this is currently not fitting the need of XWiki and thus I’ve been >>>>>> developing another tool as an experiment (which may go back in CAMP one >>>>>> day), based on TestContainers, see >>>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations >>>>>> * EvoCrash: Takes a stack trace from production logs and generates a >>>>>> test that, when executed, reproduces the crash. See >>>>>> https://markmail.org/message/v74g3tsmflquqwra. See also >>>>>> https://github.com/SERG-Delft/EvoCrash >>>>>> >>>>>> Since XWiki is part of the STAMP research project, we need to use those >>>>>> 4 tools to increase the KPIs associated with the tools. See below. >>>>>> >>>>>> Objectives/KPIs/Metrics for STAMP >>>>>> =========================== >>>>>> >>>>>> The STAMP project has defined 9 KPIs that all partners (and thus XWiki) >>>>>> need to work on: >>>>>> >>>>>> 1) K01: Increase test coverage >>>>>> * Global increase by reducing by 40% the non-covered code. For XWiki >>>>>> since we’re at about 70%, this means reaching about 80% before the end >>>>>> of STAMP (ie. before end of 2019) >>>>>> * Increase the coverage contributions of each tool developed by STAMP. >>>>>> >>>>>> Strategy: >>>>>> * Primary goal: >>>>>> ** Increase coverage by executing Descartes and improving our tests. >>>>>> This is http://markmail.org/message/ejmdkf3hx7drkj52 >>>>>> ** Don’t do anything with DSpot. I’ll do that part. Note that the goal >>>>>> is to write a Jenkins pipeline to automatically execute DSpot from time >>>>>> to time and commit the generated tests in a separate test source and >>>>>> have our build execute both src/test/java and this new test source. >>>>>> ** Don’t do anything with TestContainers FTM since I need to finish a >>>>>> first working version. I may need help in the future to implement docker >>>>>> images for more configurations (on Oracle, in a cluster, with >>>>>> LibreOffice, with an external SOLR server, etc). >>>>>> ** For EvoCrash: We’ll count contributions of EvoCrash to coverage in >>>>>> K08. >>>>>> * Secondary goal: >>>>>> ** Increase our global TPC as mentioned above by fixing the modules in >>>>>> red. >>>>>> >>>>>> 2) K02: Reduce flaky tests. >>>>>> * Objective: reduce the number of flaky tests by 20% >>>>>> >>>>>> Strategy: >>>>>> * Record flaky tests in jira >>>>>> * Fix the max number of them >>>>>> >>>>>> 3) K03: Better test quality >>>>>> * Objective: increase mutation score by 20% >>>>>> >>>>>> Strategy: >>>>>> * Same strategy as K01. >>>>>> >>>>>> 4) K04: More configuration-related paths tested >>>>>> * Objective: increase the code coverage of configuration-related paths >>>>>> in our code by 20% (e.g. DB schema creation, cluster)related code, >>>>>> SOLR-related code, LibreOffice-related code, etc). >>>>>> >>>>>> Strategy: >>>>>> * Leave it to FTM. The idea is to measure Clover TPC with the base >>>>>> configuration, then execute all other configurations (with >>>>>> TestContainers) and regenerate the Clover report to see how much the TPC >>>>>> has increased. >>>>>> >>>>>> 5) K05: Reduce system-specific bugs >>>>>> * Objective: 30% improvement >>>>>> >>>>>> Strategy: >>>>>> * Run TestContainers, execute existing tests and find new bugs related >>>>>> to configurations. Record them >>>>>> >>>>>> 6) K06: More configurations/Faster tests >>>>>> * Objective: increase the number of automatically tested configurations >>>>>> by 50% >>>>>> >>>>>> Strategy: >>>>>> * Increase the # of configurations we test with TestContainers. I’ll do >>>>>> that part initially. >>>>>> * Reduce time it takes to deploy the software under a given >>>>>> configuration vs time it used to take when done manually before STAMP. >>>>>> I’ll do this one. I’ve already worked on it in the past year with the >>>>>> dockerization of XWiki. >>>>>> >>>>>> 7) K07: Pending, nothing to do FTM >>>>>> >>>>>> 8) K08: More crash replicating test cases >>>>>> * Objective: increase the number of crash replicating test cases by at >>>>>> least 70% >>>>>> >>>>>> Strategy: >>>>>> * For all issues that are still open and that have stack traces and for >>>>>> all issues closed but without tests, run EvoCrash on them to try to >>>>>> generate a test. >>>>>> * Record and count the number of successful EvoCrash-generated test >>>>>> cases. >>>>>> * Derive a regression test (which can be very different from the >>>>>> negative of the test generated by evocrash!). >>>>>> * Measure the new coverage increase >>>>>> * Note that I haven’t experimented much with this yet myself. >>>>>> >>>>>> 9) K09: Pending, nothing to do FTM. >>>>>> >>>>>> Conclusion >>>>>> ========= >>>>>> >>>>>> Right now, I need your help for the following KPIs: K01, K02, K03, K08. >>>>>> >>>>>> Since there’s a lot to understand in this email, I’m open to: >>>>>> * Organizing a meeting on youtube live to discuss all this >>>>>> * Answering any questions on this thread ofc >>>>>> * Also feel free to ask on IRC/Matrix. >>>>>> >>>>>> Here’s an extract from STAMP which has more details about the >>>>>> KPIs/metrics: >>>>>> https://up1.xwikisas.com/#QJyxqspKXSzuWNOHUuAaEA >>>>>> >>>>>> Thanks >>>>>> -Vincent >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >>>> -- >>>> Thomas Mortagne >>> >

